Swahili Dependency Treebank Creation
This describes my thesis research and related research surrounding the creation of a universal dependencies treebank of Swahili.
Swahili in UD
Non-copyright encumbered corpora
SketchEngine Sandra is contacting the owner of the SketchEngine Swahili data to see if we can ge...
Research questions for thesis
How do Bantu languages, and Swahili in particular, integrate into the Universal Dependencies fra...
Annotation Issues
relative clauses without overt modifiers
Consider this noun phrase: Idadi ya waliofariki (from sentence #6799) the issue is that waliofa...
case with no noun?
What the heck is up with mpaka hivi sasa? Why do I have an adposition modifying an adverb???
Things to go back and fix in manually annotated corpora
check that iobj is used correctly. E.g. check that verbs which could have iobj but weren't marke...
SWH in UD questions
should a derived noun like waliokusanyika be a Vnoun? should statives be what is kuna/hakuna? i...
Annotation decisions
Dealing with non-sentences
This page will feature the decisions we make during the annotation process Non-sentence construc...
Possessive pronouns
Possessive pronouns The UD annotation guidelines state: In some of the datasets, a possessive...
Uninflected "modals"
Lazima and other modal adverbs will be marked as adverbs and connected with an advmod relation to...
Are infinitival verbs, verbs or nouns?
Infinitive verbs should always be treated as verbs, even when they have nominal modifiers The UD...
Copulas?
Inflected copulas will be marked AUX and given a cop arc copulas, will be marked as AUX and have...
Relative pronouns
Relative pronouns In cases where relative pronouns like ambayo are used, they should be assigned...
Multiple agreement?
In cases of multiple agreement, the first verb is connected to the second with a ccomp relation. ...
CCOMP vs XCOMP
The UD documentation states Clausal complements (objects), divided into those with obligatory ...
Juu can be used as a noun?
Madaktari Wasiokuwa na Mipaka waliitisha mkutano wa waandishi wa habari baada ya habari hiyo, lak...
-enye
ukurasa wenye zaidi ya wafuasi 4,5000 (sentence 8521) wenye was labeled as SCONJ but I think...
Reduced relative clauses
Vitale identifies three types of relative clauses, "full relatives" which use amba-, "reduced rel...
kuwa na
"have" is expressed using a copula "kuwa" with a preposition "na" (with). In these cases, the ob...
List of PARTicles
the question particle je
List of fixed expressions
Compound prepositions (Mohammed, 2001) baada ya: after kati ya: between ndani ya: inside of ...
tu
"tu" meaning "only" or "just" should be the dependent of the thing it is modifying. In cases lik...
Auxiliaries
kuwa used to refer to the protracted nature of an action or action at a definite moment in the...
Verbal nouns with auxiliaries
If we have a construction like "kuhusu kilichokuwa kikitokea" (sentence 5202 in the global voices...
Tense?
-ki- M.A. Mohammed reports that ki has several features that it could be assigned as a tense mar...
Verbal interrogatives
Verbs like unawezaje where the -je suffix indicates the verb is an interrogative are expressed by...
Errors made by neural models
Interrogative adjectives
It may seem strange but gani is treated as an interrogative adjective in the Helsinki corpus of S...
Hashtags need to be rejoined
The tokenizer used split pound signs from the rest of the hash tag: #NairobiBlast -> # Nairob...
-ote
-ote meaning 'all' is nearly always labeled as adv instead of det like it should be.
Reduplicated words
Any["hivyohivyo", "yuleyule", "shamrashamra", "kwelikweli", "hiyohiyo", "hatihati", "kimyakimya",...
Neural models
za3.txt is test.txt na2.txt is val.txt
Publications to cite
Swahili transformer model bake off
pretrained swahili transformer, lr=0.0001, pos tagging 2022-06-11 17:40:51,753 - INFO - allennlp...
Meeting with Sandra 1/6/2022
Explain things more thoroughly, you have as much space as you need. Write the thesis for an intel...