SLOLEKS LEXICON ACCENTUATION

During this project, the Slovene morphological Lexicon Sloleks was updated with automatic (and partially manually-annotated) accents. Additionally, a new interface which enables crowdsourcing of accent data was developed. The project focused on lemmas with fixed accents. The first step was to assign accents to the entire database automatically. By using existing dictionary resources, 55% of the assigned accents could be confirmed with an accuracy of 75%. Automatic and manual changes added up to 21,7 percent of the database being corrected. Further work is to be done on proper names and common nouns with a movable accent and accent variations.

Upgrading the use interface of the lexicon was also a part of the project. Firstly, the visual image of CJVT UL was applied to the resource; secondly, elements enabling crowdsourcing were added to the interface – users can up- or downvote accents, phonetic transcriptions, and generated pronunciations.

Sloleks is continuously being developed and upgraded under a variety of other projects as well. For example, one of the next steps is enabling users to add their own pronunciation.

 

The Sloleks 2.0 database is available under the Creative Commons Attribution-ShareAlike 4.0 International licence (CC BY-SA 4.0) at:

Dobrovoljc, Kaja; et al., 2019, Morphological lexicon Sloleks 2.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1230.

LINKS AND CONTACT

Dr. Špela Arhar Holdt
Centre for Language Resources and Technologies
Faculty of Computer and Informatiion Science
Večna pot 113, SI-1000 Ljubljana