With the Thesaurus of Modern Slovene, we are introducing a new type of dictionary called the responsive dictionary. The initial database of a responsive dictionary is constructed using advanced computational methods, instantly providing the language community with a large amount of relevant, albeit still somewhat noisy language information. A responsive dictionary is characterized by two more key traits: first, its database is openly accessible, and second, it provides a number of ways for the language community to improve the database and clean up noisy elements. This means that the construction of a responsive dictionary is never truly concluded as its data constantly evolves in accordance with changes in the modern language. All changes can be tracked using timestamps in individual entries, while the different versions of the database are stored in a dedicated archive. The responsive dictionary takes its name from the fact that the approach to its construction allows the data to continuously respond to the opinions of the contributing language community and the changes in language originating from text produced by the language community. Essentially, it is “a dictionary made by the community for the community”.
The Thesaurus of Modern Slovene is based on the data contained in two principal language resources: The Oxford®-DZS Comprehensive English-Slovenian Dictionary and the Gigafida reference corpus of written Slovene. Both resources contain language material created after 1991 and as such offer a description of modern Slovene. The links identified between synonyms were additionally confirmed using the older Dictionary of Standard Slovenian Language (SSKJ). The data extraction and structure for the Thesaurus were based on the frequency and manner in which words co-occur in translation strings of the Oxford-DZS Dictionary. This information is the basis for discriminating between ‘core’ and ‘near’ synonyms, with ‘core’ synonyms exhibiting a greater degree of connection to the keyword. In the following step, an approach combining balanced co-occurrence graphs and the Personal PageRank algorithm automatically divides the synonyms into subgroups and ranks them according to the degree of semantic relatedness to the keyword, as well as their frequency in language use. Co-occurrence graphs are used to organize synonyms in the dictionary. For a more detailed description of this methodology, see Krek et al. (2017).