Alan Ridel

We know that there were a large number (25,000 books published during the victorian era) of books, we have a lot of information about gender and year level stats.

no corpus that exists reflects the population of published novels during this period effectively.

The Chadwyck-Healey corpus is particularly bad, 50% of the data comes from male authors published before 1876 even though this was only 15% of the population.

Random sampling of the population is not really possible because we don't actually have a complete database of all novels published during the victorian era.

instead we do quota sampling.

We divide up the population into categories based on year and gender and manually encode a randomly selected chapter.

Not a representative sample
- overrepresents authors who wrote more than one novel
- over represents novels published in multiple volumes

Maybe there's a bias in which things were published or which types of genres tend to do multi volume things

The solution is to use post-stratification as a way to do analysis of granular distinctions after the fact:

e.g. novels published by women in 1940
novels involving trains

Detecting Sarcasm is Extremely Easy ;) (Parde & Nielson 2018)

Harnessing Context Incongruity for Sarcasm Detection (Joshi et al 2015)

Sarcasm as Contrast between a Positive Sentiment and Negative Sentiment

Catastrophic Interference in Neural Embedding Models (Dachapally & Jones)

Querying word embeddings for word similarity and relatdness

Multi-Task Deep Neural Networks for Natural Language Understanding

Riordan et al., 2019

Horbach et al., 2019

Riordan et al. 2020

How do you determine the worth of a language?

November 6th 2019: Hai, Peng

Alan Ridel

Hai Hu 02-19-2020

Zeeshan 02-19-2020

Overview of the SPMRL 2013 Shared Task:Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Dependency Parsing

Characterizing the Errors of Data-Driven Dependency Parsing Models

January 17th - Job search

Job talk Monica Nesbit

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Swahili Syntax (Anthony Vitale, 1981)

Developing Universal Dependencies for Wolof

Towards a dependency-annotated treebank for Bambara (Aplonova & Tyers 2018)

A Universal Part-of-Speech Tagset (Petrov, Das, McDonald)

Universal Depedencies v1: A Multilingual Treebank Collection

Reusing Grammatical Resources for New Languages

Estonian Dependency Treebank: from Constraint Grammar Tagset to Universal Dependencies

Learning Morphosyntactic analyzers from the bible via iterative annotation projection across 26 languages

Alan Ridel

No Comments