Reusing Grammatical Resources for New Languages

Lene Antonsen, Trond Trosterud, Linda Wiechetek

Takeaway

Machine-readable grammars can be more easily applied to new langauges if they are working with higher levels of analysis. Working with morphophonology, the grammatical differences between languages preclude the reuse of analyses.

We argue that portability here takes the form of reusing smaller modules of the grammar

Hopefully the paper expands on that because that statement doesn't make any sense

Languages

North Lule and South Sami
- Uralic language
- Not very agglutinative
Faroese
- Germanic language
- Four case system
Greenlandic
- Eskimo-Aleut language
- Polysynthetic

Technical background

Using existing resources developed by the University of Tromso.
- Morphological analyzers
- Constraint Grammar parsers

Reusing grammar

Blick (2006) argues for using bootstrapping techniques to reuse grammar instead of appealing to statistical systems. This fell by the wayside, everyone uses statistical methods now

The bottom of the analysis

The level of analysis that is close to the language substance cannot be directly used
Even though different languages do not have the eact same morphological processes, they may have the same process types
Rules are written in a modular fashion so they can easily be adapted to new languages
- For example, consonant gradiation processes are very common, the particulars of the rule may need to change but the module design helps guide the changes that need to be made.

Disambiguation

Mapping of syntactic tags

Large number of tags needed due to the free word order of Sami languages
- For example, four different subject tags needed specifying whether the verb is finite, whether elipsis of verb has occured, whether the finite verb is to the left or to the right etc.

The top of the analysis

This is the part that's relevant to me

Using a constraint grammar module
Syntactic tags for verbs are substituted by other tags (according to clause-type) in order to make it easier to annotate dependency across clauses
Descibes difficulties finding the "head" of the sentence (think they mean root), when dealing with ellipses. This is definitely an issue as well in UD

Still the analyzer retains very good accuracy for the dependency analysis: 0.99

This is for Sami
Table 5 say this is actually f-score?
How is this scored? Are they scoring the flat descriptors in the visl format (e.g. #5->0)
Use pairs of substitution and setparent rules

Bootstrapping

Go through small modifications to the rules to consider Faroese specific phenomenon.
Show the specific increases in performance with each new difference that is considered (e.g. when substituting the Relative pronouns that begin subordinate clauses in Sami with the CS that begins relative clauses in Faroese, the accuracy goes up to 96)

Detecting Sarcasm is Extremely Easy ;) (Parde & Nielson 2018)

Harnessing Context Incongruity for Sarcasm Detection (Joshi et al 2015)

Sarcasm as Contrast between a Positive Sentiment and Negative Sentiment

Catastrophic Interference in Neural Embedding Models (Dachapally & Jones)

Querying word embeddings for word similarity and relatdness

Multi-Task Deep Neural Networks for Natural Language Understanding

Riordan et al., 2019

Horbach et al., 2019

Riordan et al. 2020

How do you determine the worth of a language?

November 6th 2019: Hai, Peng

Alan Ridel

Hai Hu 02-19-2020

Zeeshan 02-19-2020

Overview of the SPMRL 2013 Shared Task:Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Dependency Parsing

Characterizing the Errors of Data-Driven Dependency Parsing Models

January 17th - Job search

Job talk Monica Nesbit

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Swahili Syntax (Anthony Vitale, 1981)

Developing Universal Dependencies for Wolof

Towards a dependency-annotated treebank for Bambara (Aplonova & Tyers 2018)

A Universal Part-of-Speech Tagset (Petrov, Das, McDonald)

Universal Depedencies v1: A Multilingual Treebank Collection

Reusing Grammatical Resources for New Languages

Estonian Dependency Treebank: from Constraint Grammar Tagset to Universal Dependencies

Learning Morphosyntactic analyzers from the bible via iterative annotation projection across 26 languages

Reusing Grammatical Resources for New Languages

Lene Antonsen, Trond Trosterud, Linda Wiechetek

Takeaway

Languages

Technical background

Reusing grammar

The bottom of the analysis

Disambiguation

Mapping of syntactic tags

The top of the analysis

Bootstrapping

No Comments