Synonyms and Collocations 2.0 – SoKol

Upgrading fundamental dictionary resources and databases of CJVT UL

The project addresses two fundamental building blocks of the Slovene digital language infrastructure: the Thesaurus of Modern Slovene and the Collocations Dictionary of Modern Slovene. The dictionaries are currently available in version 1.0. The project aims to upgrade both to version 2.0 and publish them on both, the dictionary interface as well as the digital language repository CLARIN.SI in the form of an open access dictionary database. The results will be available in the last months of 2022.

The Resolution on the National Program for Language Policy 2021–2025 lists the Thesaurus of Modern Slovene and the Collocations Dictionary of Modern Slovene as some of the major achievements from the 2014–2018 period, a time when Slovenia recognized the challenges in the field of language infrastructure that required swift and effective action. Both dictionaries introduce the concept of a “responsive dictionary”, which provides fast access to open data on modern language use: responsive dictionaries are created by algorithms and are immediately accessible for all in this version. What follows is a gradual linguistic review and editing process, in which the wider community can participate.

Given that response dictionaries are based on regular updating, we assessed the community priorities regarding the first updates of the Thesaurus and Collocations Dictionary by surveying users, conducting interviews and evaluations.

The Thesaurus is made with the users in mind – they can enter their own synonym suggestions – therefore we will ensure more transparent and faster editorial protocols. We will also add a section dedicated to antonyms. Regarding the Collocations Dictionary, we will develop a system for advanced search of multi-word dictionary information. We aim to add qualifiers for both dictionaries, with priority for hateful and derogatory vocabulary. For these dictionary entries, we will review use examples, which are automatically imported from a corpus. Lastly, we will equip at least 2,000 dictionary entries with information regarding meaning and classify synonyms and collocations under the appropriate semantic indicators.

The project is financed by the Ministry of Culture of the Republic of Slovenia as part of the Public Tender for (co)financing projects intended for the construction and modernization of the infrastructure for the Slovenian language in the digital environment 2021–2022.