Error annotation
24% of the texts in the KOST corpus are annotated with error tags, thus annotating linguistic errors made by learners writing in Slovene. The error annotation is done manually. Each error is assigned a corrected form and one of 23 error categories.
Error taxonomy
The basic error categories are listed together with an illustrative example from KOST (in Slovene).
Orthographical errors
Punctuation (Z-LOC): v zimskem času, večina ljudi uporablja avtomobile > v zimskem času večina ljudi uporablja avtomobile
Spelling (Z-CRK): bolše > boljše
Joined or divided words (Z-SN): ni sem > nisem
Capital letters (Z-MV): najpomembnejši praznik v moji državi je Božič > najpomembnejši praznik v moji državi je božič
Abbreviations (Z-KR): in dr. > idr.
Lexical errors
Noun (B-SAM): ni mi všeč kadiranje > ni mi všeč kajenje
Verb (B-GLAG): sem se zelo težko naučila na mir > sem se zelo težko navadila na mir
Adjective (B-PRID): sem družbena oseba > sem družabna oseba
Pronoun (B-ZAIM): onidva > onadva
Adverb (B-PRISL): ko pride doma > ko pride domov
Preposition (B-PRED): sa prijateljico > s prijateljico
Conjunction (B-VEZ): kdaj sem obiskal Turčijo > ko sem obiskal Turčijo
Other (B-OST): petindvajest > petindvajset
Morphological errors
Noun (O-SAM): nagajajo pticami > nagajajo pticam
Verb (O-GLAG): iškem > iščem
Adjective (O-PRID): najglavnejša > najbolj glavna
Pronoun (O-ZAIM): v nama > v nas
Adverb (O-PRISL): pomaga boljši poznati materni jezik > pomaga bolje poznati materni jezik
Other (O-OST): štirje predavanja > štiri predavanja
Syntactical errors
Structure (S-STR): rada bi da živim sama > rada bi živela sama
Word order (S-BR): zdi mi se > zdi se mi
Omission (S-IZP): ki sem jedla > ki sem ga jedla
Insertion (S-ODV): ne bova bila > ne bova
Related corrections
Here, corrections have to be made after something else in the context has been corrected: z mojim fantom > s svojim fantom
Attention!
As error annotation is to some extent subjective, this must be taken into account when analysing the results.
Error annotation app
For greater accuracy in error annotation, there is an Error tagging manual (Slovene version only) that all annotators follow. The manual is updated as necessary; it was last updated in October 2023.
Aplikacija za označevanje napak
The errors are annotated in Svala, a specially developed application.