Reading Notes

Sarcasm Detection

Sarcasm Detection

Detecting Sarcasm is Extremely Easy ;) (Parde & Nielson 2018)

Gist

Intro

Background

Sarcasm detection methods

Data source

Features

Classification Algorithm

Naive bayes using Daume III (2007)'s method for domain adaptation. to generate source, target and general feature mappings.

Results

.59 F-score on twitter data, 1% over previous literature (not really meaningful) Recall of system is much higher (.68 vs .62) at the cost of some precision (53 vs 55). .78 F-score on Amazon reviews, much higher than previous results (Buschmeier et al 2014) (78 to 74). Once again, much higher recall (82 to 69) at the cost of precision (75 to 85)

Error analysis

Sarcasm Detection

Harnessing Context Incongruity for Sarcasm Detection (Joshi et al 2015)

Gist

Dataset

Primarily focused on tweets.

Discussion board datasets

ML System

Detecting incongeruity

Features

Explicit Incongruity features

"""

Analysis

Sarcasm Detection

Sarcasm as Contrast between a Positive Sentiment and Negative Sentiment

Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, Ruihong Huang

Novel bootstrapping algorithm that learns lists of positive sentiment phrases and

"Bootstrapping algorithm that automatically learns phrases corresponding to negative sentiments and phrases corresponding to negative situations" p. 705

Bootstrapped learning of positive sentiments and negative situations

"Our goal is to create a sarcasm classifier for tweets taht explicitly recognizes contexts that contain a positive sentiment contrasted with a negative situation" p. 706

They're learning phrases that have positive or negative connotations using a single seed word "love" and a collection of sarcastic tweets.

"Operates on the assmption that many sarcastic tweets contain both a positive sentiment and a negative situation in close proximity, which is the source of the sarcasm" p. 706.

They focus on positive verb phrases and negative complements to that verb phrase.

They don't parse because, well, parsing tweets is messy and hard. Instead they use just part of speech tags and proximity as a proxy for syntactic structure.

"We harvest the n-grams that follow the word 'love' as negative situation candidates. WE select the best candidates using a scoring metric and add them to a lsit of negative situation phrases. p.706

Next we explait the structural assumption in the opposite direction. Given a sarcastic tweet that contains a negative situation phrase, we infer tha tthe negative situation phrase is preceded by a positive sentiment. We harves the n-grams that preceed the negative situation phrases as postive sentiment candidates, score and select the best candidates, and add them to the list of positive sentiment phrases" (p. 706)

Using only 175,000 tweets... Quite small for such distantly supervised stuff to work.

They use #sarcasm as indicative of the sarcastic class.

They use part of speech patterns to identify verb phrases and noun phrase.

They're scoring each candidate based upon how well they corresond with sarcasm. E.g. "we score each candidate sentiment verb phrase by estimating the probability that a tweet is sarcastic given that it contains the candidate p hrase preceeding a negative verb phrase" p. 708

and "we score each remaining candidate by estimating the probability that a tweet is sarcastic given that it contaisn the predicative expression near (within 5 words) of a negative situation phrase"

We found that the diversity of positive sentiment verb phrases and predicative expressions is much lower than the diversity of negative situation phrases

Makes good sense that they found this ^ However, they seem to have more stringent filtering for the positive expressions...

Neural Networks

Neural Networks

Catastrophic Interference in Neural Embedding Models (Dachapally & Jones)

Catastrophic forgetting is the tendancy of neural models to have a strong recency bias e.g. more recent training examples are more likely to be predicted.

DSM

Distributional Semantic Models encompass geometric models like latent dirchlet allocation and svd as well as neural embedding models. Neural embedding models are

Experiment 1

Create artificial data

using the following sentence generation patterns

The idea is to capture the two homophonous meanings of 'bass' and place them in embedding contexts identical to that of a synonym.

Ordering of data

Balancing distribution of homophones

1/3 of one meaning

Evaluation

Looked at cosine similarity between word embedding vectors learned

Experiment 2

Conducted using real data TASSA corpus

Neural Networks

Querying word embeddings for word similarity and relatdness

Word relatedness is sometimes asymmetrical e.g. stork may elicit associations with baby but baby may not generate associations with stork.

Similarity is symmetrical.

Neural Networks

Multi-Task Deep Neural Networks for Natural Language Understanding

Answer Scoring

Answer Scoring

Riordan et al., 2019

How to account for mispellings:

Quantifying the benefit of character representations in neural content scoring models


Takehome:

"Models with character representations outperformed their word-only counterparts...lower MSE and higher QWK" p. 121


Datasets


Methods

Word only model

pretrained word embeddings into bidirectional GRU. Hidden states of GRUs are either pooled or go through an MLP attention mechanism Output of the encoder goes through sigmoid fully connected layer which produces a score

Character + word models

Each word is represented with a sequence of 25-dimensional character embeddings. "Character embeddings are concatenated with the word embeddings prior to the word-level encoder" (p. 119)


Results

ASAP-SAS

While adding character representations performed better than just spelling correction, the effect of adding character representations was not statistically significant in the GLMM model and using spelling corrections was not significant either.

No evidence for interaction between character representations and spelling correction in the GLMM.

Formative K12-SAS

Same general trend as ASAP-SAS

Statistical significance between the different representations and the different methods of spelling correction but no interaction observed between mispelling bins and the representation used.

"The difference between feature sets and between mispellings bins was significant even when controlling for score and number of words" (p. 123)

Large majority of responses had no spelling errors. 3 spelling bins used (0, 1, 2+)


Q: Is spelling not what the character representations are able to capture? Is it instead morphological variation?

Q: I thought that the addition of character representations was helpful for two of the datasets but not the last one. The conclusion reached was that character representations were not as helpful as spelling correction but I think this was only significant for the 2nd dataset.

Q: Are the character representations alone enough? (what if you dropped words)

Answer Scoring

Horbach et al., 2019

The influence of variance in learner answers on automatic content scoring

Andrea Horbach and Torsten Zesch

Variance

Sources of variance

Answer Scoring

Riordan et al. 2020

An empirical investigation of neural methods for content scoring of science explanations

NGSS science standards dimensions

KI rubric:

Data

Constructed reponse (CR) items are evaluated. The ones chosen are cases where SEPs need to be used while showing understanding of CCCs and DCIs.

CR Items:

Two separate rubrics in parallel:

The thermodynamics challenge item was particularly challenging.

Sometimes there were less annotated data available for the NGSS dimension models compared to the KI models.

Models

Each item and score type were trained independently. 10-fold cross validation with train/val/test splits, evaluating on concastenated predictions across folds.

SVR

RNN

Pretrained transformer

Results

KI models

The Pretrained transformer models are more robust, they're always ahead of the RNN on all metrics (sometimes not by much though).

The items that were highly skewed showed lower levels of human-machine agreement (lower thant he 0.7 threshold for QWK in real world scoring applicaitons) Where does that threshold come from??

CLINGDINGS

CLINGDINGS

How do you determine the worth of a language?

How do you determine the worth of a language?

Arle Lommen October 30 2019

2019 is the UN year of indigenous languages.

Every language has an intrinsic value

However, in a world of limited resources, not all 7,000 of the world's languages can be invested in.

less than 1% of content is translated into another language

To cover 100% of content this would take about 20 million translators. This is only for one additional language though.

to cover all 135 economically important languages we would need 2 billion translators.

Let's look at other views of value

the number of speakers does not determine the value of a language for a business

Maybe we can look at GDP?

Internet adoption rate

Pre 2019 CSA was selecting 50? language with online relevance (e.g. usage by communities online).

Calculated number of speakers for each country/territory.

Used a zero-sum approach (no accounting for multi-lingualism).

Assigned languages to four tiers basd on cumulative market research.

This measure is called eGDP or electronic GDP, this is not a measure of ecommerce.

after 2019,

CLINGDINGS

November 6th 2019: Hai, Peng

Product harm report evaluation

Product harm crises are when products cause indicents and lead to issues and then the public response produces negative publicity for the company and the government body in charge of regulation

Two issues for issuing a recall

Legal wiggle room

Research questions

do recall commmunication examples differ across industries

Hypothesis

the idea is that these shape the way that the company frames their response.

argument structure is crucial for previous research, in addition to subjectivity measures

difference emerges in number of content words (nouns, verbs adjectives and adverbs)

word (lexical complexity )

Structural complexity (length of t-unit + dependency length (what is a tunit?)

Reading ease score takes into consideration number of syllables per word and number of words per sentence.

However, the number of syllables per word is hard to reliably calculate.

CLINGDINGS

Alan Ridel

We know that there were a large number (25,000 books published during the victorian era) of books, we have a lot of information about gender and year level stats.

no corpus that exists reflects the population of published novels during this period effectively.

The Chadwyck-Healey corpus is particularly bad, 50% of the data comes from male authors published before 1876 even though this was only 15% of the population.

Random sampling of the population is not really possible because we don't actually have a complete database of all novels published during the victorian era.

instead we do quota sampling.

We divide up the population into categories based on year and gender and manually encode a randomly selected chapter.

Maybe there's a bias in which things were published or which types of genres tend to do multi volume things

The solution is to use post-stratification as a way to do analysis of granular distinctions after the fact:

CLINGDINGS

Hai Hu 02-19-2020

Building a natural language inference dataset in Chinese

What is NLI?

when you have to determine whether a hypothesis contradicts, entails from or is neutral towards a premise.

Issues with SNLI

Turkers do not want contradiction to go both ways.

Bias in hypotheses

If you train on SNLI on just the hypotheses, you get better than majority baseline.

There's bias in the hypotheses One thing is that sleeps contradicts almost any other action. Additional heuristics in the dataset probably introduced by the Turkers probably exist. By creating synthetic data that goies against the heuristics, the result is very very poor performance (19% accuracy for BERT was the best).

XNLI:

Our chinese NLI

Todo

CLINGDINGS

Zeeshan 02-19-2020

Internship at Amazon and forthcoming thesis

What is transfer learning?

Multi-task learning

Hard vs soft parameter sharing Hard parameter sharing literally shares some of the initial layers and then has task specific layers towards the end.

Soft parameter sharing uses soem method of regularization to force common layers for the two tasks to be close to eachother.

Parsing

Parsing

Overview of the SPMRL 2013 Shared Task:Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Central topic

Methodology

Datasets

Findings

Previous research

Notable quotes

While progress on parsing English -- the main language of focus for the ACL community -- has inspired some advances on other languages, it has not, by itself, yielded high-quality parsing for other languages and domains. This holds in particular for morphologically rich languages... where important information concerning the predicate-argument structure of sentences is expressed through word formation, rather than constituent-order patterns as is the case in English and other configurational languages. p. 146

recently, advances in PCFG-LA parsing (Petrov et al. 2006) and language-agnostic data-driven dependency parsing (McDonald et al. 2005; Nivre et al. 2007b) have made it possible to reach high accuracy with classical feature engineering techniques in addition to, or instead of, language specific knowledge. p. 147

Follow up readings

Parsing

Dependency Parsing

Parsing

Characterizing the Errors of Data-Driven Dependency Parsing Models

McDonald & Nivre 2007

Background

Two basic approaches to dependency parsing are all pairs and stepwise.

"All pairs" approaches make decisions globally, use exact inference but have relatively impoverished features

"stepwise" approaches make greedy decisions, but have a rich feature representation including past decisions.

Both achieve similar performance but the kinds of errors they make are different. Segue and Lavie (2006) shwo that combining the predictions of both types of models yields "significantly improved accuracy" This paper is going to talk about the strengths and weaknesses of the approaches.

Two models for dependency parsing

Preliminaries

Describes what a dependency tree is, & graph based and transition based dependency parsing. Overall, Kuebler et al (2009) has a more thorough discussion of the different approaches.

Global Graph based parsing

Labels are portrayed as part of the scoring function in this work. I believe how scoring labels works varies between different approaches but I have to look further into this

The primary disadvantage of these models is that the feature representation is restricted to a limited number of graphs arcs. This restriction is required so that both inference and learning are tractable

MSTparser is the implementation used.

Local, Greedy, Transition-Based Parsing

The primary advantage of these models is that afeatures are not restricted to a limited number of graph arcs but can take into account the entire dependency graph built so far. The main disadvantage is that the greedy parsing strategy may lead to error propogation.

CONLL-X shared task

13 languages 19 systems labeled attachment score was official metric (percentage of tokens, excluding punctuation, that are assigned botht he correct head and the correct dependency label).

Error analysis

Graph factors

All current parsers have more trouble on longer sentences. MaltParser performs better in shorter sentences, worse as sentences get longer. Attributed to likelihood of error propogation being higher for longer sentences and richer feature representation as beneficial for short sentences.

MSTParser far more precise than MaltParser for longer dependency arcs (where the length is the length of the predicted arc). MaltParser does better for shorter dependency arcs. Overall MSTParser is not affected by dependency length.

MSTParser is far more precise close to the root and is less precise then Malt further from the root.

Dependency arcs further from the root are (usually) created first in transition based systems. Thus this is further evidence that error propogation is partly to blame for the difference between the two approaches.

MSTParser over predicts arcs near the bottom of the graph. Whereas MaltParser pushes difficult parsing decisions higher in the graph, MST Parser appears to push these decisions lower

Linguistic Factors

Findings with regard to part of speech associations are tied to previous findings of position in graph.

Adpositions are a bit strange because they have high average root distance and low average dependency length but MSTParser does okay on them.

Reading Template

Central topic

Methodology

Findings

Follow up readings

Professionalization workshop

Professionalization workshop

January 17th - Job search

Why this is important:

some facts about IU linguistics

should new hires reflect the traditional strengths of the department or cutting edge research?

The talk on the 26th is a sociolinguistics talk, all the others are syntax talks.

if you have thoughts then you should contact Nils or Samson

Syntax job search

Quality of candidates is important

Fit in the department is equally important

Be careful when searching for a job

After job talks the whole remainder of the afternoon is for grad students

What should we do to prep?

We may have a debriefing session after them?

Professionalization workshop

Job talk Monica Nesbit

Lansing speech corpus:

Had to exclude people

none of the younger speakers show the hallmarks of the northern city shift.

is ash [+tense]?

People generally don't want to have syllables without codas for lax vowels. What do people from Lansing, Michigan do?

Younger northern inland speakers pattern more closely to the candian model.

Widespread perception that inland northern speech is 'correct'

Lansing: auto town

Low back merger shift is a western change shift. Does the chain shift happening in California match the northern city shift? Turns out no not really? Loss of local dialects paralleled in New England as well (rsearch at Dartmuth).

Language Modelling

Language Modelling

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

The paper and dataset can be found here.

Dataset

Island phenomenon are the hardest for language models to deal with, scores on these minimal pairs are near chance. (which is funny because these are fairly robust restrictions in human grammacality judgements).

What if you trained a language model with negative examples using the minimal pairs provided? Frame it as a classification problem and see how they compare then? Kind of avoids the issue at heart since the paper is looking at how well existing language models address these phenomenon but it would be interesting to see if these architectures can model this information

The CoLA dataset features 10,000 judgements and that shows BERT and company doing well at that task. (this is mentioned later)

Related work

Language modelling

recent shifts to transformer models have resulted in reduced perplexity however, "this doesn't give insight into these models' linguistic knowledge."

Evaluation on downstream task benchmarks (Wang et al. 2018, 2019a) is more informative, but might not prsent abroad enough challenge or represent grammatical distinctions at a sufficiently fine-grained level.

Evaluation of linguistic knowledge

There have been previous works that examined using minimal pairs to infer whether language models learn about specific linguistic phenomenon.

However, most of these works have been limited in what they investigated:

There are corpora that contain grammaticality judgements for sentences. The most recent and largest is CoLA (WArstadt et al. 2019b). CoLA is included in the GLUE benchmark.

Current transformer models can be trained to give excellent results on this data.

looks like my previous idea has already been done

When Warstadt and Bowman (2019) investigaated the performance of pretrained language models including an LSTM, GPT and BERT, they found that the models did well on "sentences with marked argument structure" and did worse on sentences with long-distance dependencies (though transformer models did better there).

evaluating supervised classifiers prevents making strong conclusions about the models themselves, since biases in the training data may affect the results

Performance could be due to the occurance of similar examples in the training dataset.

When language models are evaluated on minimal pairs, this evades the problem.

The authors say the probability of a sentence (and thus the inverse perplexity) can be used as a proxy of acceptability.

Dataset

The data is automatically generated using expert crafted grammars.

Data generation procedure:

Sometimes implausible sentences can be generated but the authors view this as a non-issue.

The authors consider frequent inclusion of a phenomenon i na syntax/semantics textbook as an informal proxy for what is core linguistic phenomenon for English. (not especially useful when examining non-English languages as few are taught from the perspective of a different language. E.g. a minimalist syntax textbook that only discusses French)

These are the phenomenon covered

Comparison to related resources

with 3000 words, this has the widest vocabulary of any related generated dataset. 11 different verb subcategorization frames.

Other works like Linzen et al (2016) that use a larger lexicon size but use data-creation methods that are limited in control or scope.

Validation

Used mechanical turk 20 annotators rated 5 pairs from each of the 67 paradigms. aggregate human agreement is estimated at 96.4%.

Only cases where the annotators agreed with Blimp on 4/5 examples from each paradigm were included. (the 67 paradigms included passed, 2 additional ones were rejected on these grounds).

Individual human agreement approximated at 88.6%

Evaluation of language models

GPT-2 achieves highest scores, n-gram the lowest, LSTM and Transformer-XL tied.

the results seem to indicate that access to training data is the main driver of performance on Blimp for the neural models we evaluate

They point to the fact that the LSTM and Tranformer-XL models performed about the same despite wildly different architectures and GPT-2 had 100x the training data but similar architecture to Transformer-XL.

Phenomenon specific results

models perform best to human level on morphological phenomena (anaphor agreement, determiner-noun agreement, and subject-verb agreement). Possibly because english doesn't have that much of this

GPT2 is the only model that performs above chance on Islands but it is still 20 points behind humans. they are very hard in general.

Wilcox et al (2018) concluded that LSTMs have knowledge of some island conditions which contridicts the findings here. However, Wilcox et al. compare four related sentences with or without gaps, obtaining wh-licensing as a metric of hos strongly the language model identifies filler-gap dependency in a single spot, the lm has learned the constraint if the probability is close to 0. this is difficult to parse, I think I need to read the original paper

This paper finds that neural models can identify long-distance dependencies but not the domains where these dependencies are blocked.

weak performance on argument structure is somewhat strange because previous work has suggested that argument structure is a solid domain for neural models. However, these works (Warstadt and Bowman (2019)) trained the model on CoLA and didn't do direct language modelling.

Contribution

I am unfamiliar with the creation of minimal pair datasets for evaluation of neural language models. It seems that this paper's main contribution, though, is the creation of their new dataset that approaches minimal pairs with more breadth: including examples of many more types of English linguistic phenomena. They have the widest vocabulary of any generated dataset like this, including a large number of verb subcategorization frames.

Swahili Syntax

Swahili Syntax

Swahili Syntax (Anthony Vitale, 1981)

Grammatical Sketch

Very brief grammatical sketch, strong focus on syntax which is nice to see since most grammar sketches avoid syntax as much as possible :)

SVO structure

Swahili is a positional language rather than a case language. That is, it is at least partly the position of constituents in a phrase-marker which determines grammatical relations such as ''subject'', ''object'' and so on. (p. 18)

Variations in word order

Word order may differ from the normal SVO sequence due to such factors as emphasis, definiteness, and type of information (i.e. "old" vs "new"). (p 19).

Permutations are typically unambiguous due to the very clear verbal morphology indicating the noun class + person/number of the subject and object.

If both NP's contain the same feature specifications for class and person, a late movement rule such as this one is typically blocked (p. 19).

These are typically interpreted as having an SVO order unless intonational changes accompany the sentence (see Maw 1969).

Simplex sentences

Complex sentences

The syntax of voice

Theoretical implications

Going to skip this part because I'm working in a completely different framework (and, in fact, most generative syntacticans are as well).

Syntax for "Exotic" languages

Syntax for "Exotic" languages

Developing Universal Dependencies for Wolof

The paper can be found here.

Wolof is a Niger-Congo language (however it is Senegambian where Bantu languages are Benue-Congo (citation needed)).

Computational grammar of Wolof in the LFG framework (page 1). This was used to create the first treebank of Wolof (see the ParGramBank paper Sulger et al. 2013).

The dependency treebank crated is not the result of automatic conversion of the LFG treebank, though the LFG treebank did serve as a basis for annotation. However, this is because they see significant mapping issues between LFG and UD (though they plan to do this automatic conversion at a later time).

13 noun classes (8 singular, 2 plural, 2 locative and 1 manner). Locus of noun class marking is on the nominal modifiers not the noun.

Determiners encode proximal and distal relations for both the speaker and addressee.

Noun classes in Wolof lack semantic coherence (citing McLaughlin, 1997).

Wolof nouns are typically not inflected except for the genetive and possessive case

No adjective category, stative verbs used instead (similar to Swahili though there are still a small set of Adjectives in Swahili).

Syntax for "Exotic" languages

Towards a dependency-annotated treebank for Bambara (Aplonova & Tyers 2018)

To Read

Universal Dependencies

Universal Dependencies

A Universal Part-of-Speech Tagset (Petrov, Das, McDonald)

Abstract

Introduction

Underlying these studies is the idea that a set of (coarse) syntactic POS categories exist in similar forms across languages"

Tagset

The tags

Experiments

POS tagging accuracy comparison

Grammar induction

Universal Dependencies

Universal Depedencies v1: A Multilingual Treebank Collection

(Nivre, Marneffe, Ginter, Goldberg, Hajic, Manning, McDonald, Petrov, Pyysalo, Silveira, Tsarfaty, Zeman)

Introduction

History

UD today is dependent upon prior research

What other UD-like projects existed?

Annotation guideline principles

Word segmentation

Morphology

Lemma

No guidelines provided for what the lemmas should look like. E.g. should lemmas include derivational morphemes, what should you do for suppletives etc.

Part of speech tag

Morphological features

Syntax

mwe and name are both left headed with a flat structure (e.g. all are connected to the left-most part of the name or mwe). This is carried over to fixed and flat in UD v2 which means I need to fix some of my names that I've annotated

Relations between content words

The UD view is that we need to recognize both lexical and functional heads, but in order to maximize parallelism across languages, only lexical heads are inferable from the topology of our tree structures

Language-specific relations

CG to Dependency Parse

CG to Dependency Parse

Reusing Grammatical Resources for New Languages

Lene Antonsen, Trond Trosterud, Linda Wiechetek

Takeaway

Machine-readable grammars can be more easily applied to new langauges if they are working with higher levels of analysis. Working with morphophonology, the grammatical differences between languages preclude the reuse of analyses.

We argue that portability here takes the form of reusing smaller modules of the grammar

Hopefully the paper expands on that because that statement doesn't make any sense

Languages

Technical background

Reusing grammar

The bottom of the analysis

Disambiguation

Mapping of syntactic tags

The top of the analysis

This is the part that's relevant to me

Still the analyzer retains very good accuracy for the dependency analysis: 0.99

Bootstrapping

CG to Dependency Parse

Estonian Dependency Treebank: from Constraint Grammar Tagset to Universal Dependencies

Kadri Muischnek, Kaili Müürisep, Tiina Puolakainen

Central topic

Dependency treebank in Universal Dependencies formalism adapted from an existing dependency treebank for Estonian. this adaptation was doen semi-automatically using a Constraint Grammar transfer rule system.

Methodology

Structure of annotations

The Estonian Dependency Treebank (DT) is annotated in Constraint Grammar style. There are three layers:

This is an example word tag in a larger sentence (for more information see Figure 1 in the paper).

"<lamnast>"
   "lamnas" Lt S com sg part @<Q #6->5 

The used set of syntactic relations derives from Constraint Grammar, but the definitions of syntactic relations...are based on an academic description of Estonian grammar

Differences between UD and EDT annotation

Both EDT and UD adopt dependency grammar-based annotation guidelines. However, different syntactic relations are used and some phenomena are analyzed differently.

POS tags

No DET tag in estonian UDT, smilar decision made for The Fininnish UDT. PART not used because these things are currently tagged as adverbs or pronouns and it would require manual effort to retag them.

No discussion of annotation of morphological features.

Ditransitives are not used as there are no grammatical descriptions of Estonian that describes ditransitives in Estonian.

EDT distinguishes between finite and non-finite (subordinate) clauses with finite clauses not indicating the syntactic relation between the head of the finite clause and the main clause what are they doing here then? This is very unclear in this paper, maybe I need to read the paper for the EDT in order to make sense of this.

EDT annotated modals and other auxiliaries as multi-word predicates. Many of these are set up as complementary clauses with ccomp and xcomp in UD instead.

Primacy of content words in UD causes a large number of changes. EDT did a lot of relations between functional words. For example, nouns in a prepositional phrase were dependents of the preposition, while the preposition was dependent on the larger context. In UD, this has to be changed because dependency relations need to be between content words.

Conversion procedure

Findings

Estimation of conversion quality:

UD's emphasis on dependencies between content words results in projectivity (often). Where EDT was non-projective, the UD version is projective.

Follow up readings

Bantu NLP

Bantu NLP

Learning Morphosyntactic analyzers from the bible via iterative annotation projection across 26 languages

Garrett Nicolai and David Yarowsky

Central topic

Morphlogical analysis and lemmatization using English taggers, cross-linguistic projection and then an iterative discovery, constraint, and training process.

Methodology

Findings

Follow up readings