Universal Dependencies

A Universal Part-of-Speech Tagset (Petrov, Das, McDonald)

Abstract

Introduction

Underlying these studies is the idea that a set of (coarse) syntactic POS categories exist in similar forms across languages"

Tagset

The tags

Experiments

POS tagging accuracy comparison

Grammar induction

Universal Depedencies v1: A Multilingual Treebank Collection

(Nivre, Marneffe, Ginter, Goldberg, Hajic, Manning, McDonald, Petrov, Pyysalo, Silveira, Tsarfaty, Zeman)

Introduction

History

UD today is dependent upon prior research

What other UD-like projects existed?

Annotation guideline principles

Word segmentation

Morphology

Lemma

No guidelines provided for what the lemmas should look like. E.g. should lemmas include derivational morphemes, what should you do for suppletives etc.

Part of speech tag

Morphological features

Syntax

mwe and name are both left headed with a flat structure (e.g. all are connected to the left-most part of the name or mwe). This is carried over to fixed and flat in UD v2 which means I need to fix some of my names that I've annotated

Relations between content words

The UD view is that we need to recognize both lexical and functional heads, but in order to maximize parallelism across languages, only lexical heads are inferable from the topology of our tree structures

Language-specific relations