Overview of the SPMRL 2013 Shared Task:Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Central topic
- Provide standard datasets for morphologically rich languages in different representations and parsing scenarios.
- Standardize the evaluation protocol on morphologically ambiguous input
- Raise community awareness with regard to the difficulty of parsing morphologically rich languages
Methodology
Datasets
- Include data in both constituency and dependency annotation.
- full data setup and small setup (5,000 sentences)
- Three parsing scenarios:
- gold segmentation, pos tags, and morphological features are provided
- automatically predicted segmentation, pos tags and features
- lattice of multiple possible morphological analyses and joint disambiguation of the morphological analysis and syntactic structure
Findings
Previous research
- first statistical parsing models were generative and based upon treebank grammars
-
applying the phrase-based treebank grammar tecniques is sentsitive to language and annotation properties, and these models are not easily portable across languages and schemes.
Notable quotes
While progress on parsing English -- the main language of focus for the ACL community -- has inspired some advances on other languages, it has not, by itself, yielded high-quality parsing for other languages and domains. This holds in particular for morphologically rich languages... where important information concerning the predicate-argument structure of sentences is expressed through word formation, rather than constituent-order patterns as is the case in English and other configurational languages. p. 146
recently, advances in PCFG-LA parsing (Petrov et al. 2006) and language-agnostic data-driven dependency parsing (McDonald et al. 2005; Nivre et al. 2007b) have made it possible to reach high accuracy with classical feature engineering techniques in addition to, or instead of, language specific knowledge. p. 147
No Comments