Šolar corpus

Corpus of school written products Šolar is a corpus of texts produced independently by Slovenian primary and secondary school pupils. Šolar is modelled on language acquisition corpora, but differs from most of these corpora in that (a) the texts are not project-initiated, but represent the actual school production of pupils, and (b) the language corrections highlighted in the corpus are real, and were made by teachers, not researchers. The use of teachers’ corrections for the purpose of marking students’ language errors makes the corpus unique not only in Slovenia, but also in Europe and the world.

The current version of Šolar 3.0 contains 5,485 texts written by Slovenian secondary school students (15-19 years old) and primary school students in grades 7-9, with a small percentage from grade 6. For each text, information is given on the school (primary or secondary), the subject, the level (grade or year), the type of text, the region and the year of production. The majority of the corpus is made up of essays, but there are also texts produced in the classroom, such as summaries or descriptions, examples of formal applications, etc.

Part of the corpus (2,094 texts) is annotated with teachers’ corrections according to the annotation system described in the annotation guidelines. Teachers’ corrections were part of the original files and reflect real classroom situations in the evaluation of the essays.

The different versions of the corpus are available in the CLARIN.SI repository, links can be found under the Databases section of our website.

The latest development related to the Šolar corpus is a brand new concordancer specifically designed for corpora with language corrections. You can try it out here.

LINKS AND CONTACTS

Center for language resources and technologies, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana