For the last thirty years, lexicographers working on the description of the lexicon attempted to use various automated procedures to analyze language data and generate lexicographic descriptions (Atkins and Rundell 2008, Gantar et al. 2016). The latest developments in generative AI triggered attempts to use new tools for lexicographic purposes (Lew 2023; Rees et al. 2023; Jakubíček & Rundell 2023). However, after the first attempts, it was found that there is a significant difference between the ability of LLMs to produce quality lexicographic content for English and for other languages, in particular for the less-resourced languages or those that are under-represented in LLMs (de Schryver, 2024).
The Digital Dictionary Database for Slovene (DDDS) will be improved on various levels of linguistic description using the models produced in T1.1. We will generate morphological and semantic data, focusing on 1) morphological paradigm generation, 2) word-sense discrimination, 3) generation of various types of definitions (semantic indicators, simplified, terminological, etc.), 4) improving collocations and examples of use; 5) attribution of labels (stylistic, normative, domain, genre, etc.) 6) description of idiomatic, figurative and metaphorical language, etc. The result will be a significantly improved DDDS, which will be, in turn, used for improving models in T1.1. All versions of DDDS will be available as publicly available datasets and via open-access API.