Publications
This list includes all the publications which were created in the context of the LLM4DH project.
Book chapters
Journal articles
- Kuzman, T. and Ljubešić, N. (2025). LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. IEEE Xplore, 13, 35621-35633. doi: 10.1109/ACCESS.2025.3544814
- Mochtak, M., Rupnik, P., Kuzman, T. and Ljubešić, N. (2025). Parlasent: mapping sentiment in political discourse with large language models. Political Research Exchange, 7(1). https://doi.org/10.1080/2474736X.2025.2508377
Conference papers
- Krek. S. (2025, March 24-29 ). GRAVITACIJA – Veliki jezikovni modeli za digitalno humanistiko [Conference presentation]. Jožef Stefan days, Ljubljana, Slovenia. https://dnevi.ijs.si/#DOV
- Robnik Šikonja, M. (2025, April 16-17). Veliki jezikovni modeli za slovenščino in prevajanje [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
- Arhar Holdt, Š. (2025, April 16-17). Lektoriranje v času umetne inteligence: Kdo bo postavljal piko na UI? [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
- Kuzman, T. (2025, April 16-17). Prednosti in tveganja uporabe ChatGPTja za prevajalce [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
- Arčon, T. (2025, June 4). Lost in instructions? Developing a systematic approach to instruction tuning datasets for LLMs [Conference presentation]. Picacsa 2025. Ljubljana, Slovenia.
- Jelovčan, G. (2025, June 4). Grammar Error Correction Dataset for Slovene Language [Conference presentation]. Picacsa 2025. Ljubljana, Slovenia.
- Vreš, D. (2025, June 4). GaMS-9B: Pushing the Boundaries of Slovenian Large Language Models [Conference presentation]. Picacsa 2025. Ljubljana, Slovenia.
- Robnik, Šikonja, M. (2025, June 10). Projekt PoVeJMo, Gravitacija in ERA Chair projekt AI4DH [Conference presentation]. 4. Nacionalna konferenca Umetna inteligenca – nove smeri razvoja in izzivi za Slovenijo. Mengeš, Slovenia. https://dogodki.vlada.si/umetna-inteligenca-digitalna-preobrazba-prijava
- Robnik Šikonja, M. (2025, June 13). Large Language Models for Analysis of Complex Phenomena [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Arčon, T, Robnik Šikonja, M. and Tratnik, P. (2025, June 13). Motif Detection Using Large Language Models: The Cinderella Case Study [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Horvat, M., Koražija, J. and Tratnik, P. (2025, June 13). Modeling Deliberative Values in Narrative Culture Using LLMs [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Babnik, J. and Tratnik, P. (2025, June 13) The Dragon-Slayer’s Narrative: Structural Kinship and Discursive Divergence [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Babnik, J. and Martinc, M. (2025, June 13) Considering Modes: Semiotics and Multimodal AI
- [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Robnik, Šikonja, M. (2025, June 17). The importance of language data for the development of LT solutions – future steps [Conference presentation]. EU LDS Country Workshop. Ljubljana, Slovenia. https://language-data-space.ec.europa.eu/events/lds-country-workshop-slovenia-2025-06-17_en
- Kosem, I. (2025, July 2-5). Implementing AI in lexicographic workflow: challenges and opportunities [Conference presentation]. 29th International Conference of the African Association for Lexicography. https://www.afrilex.co.za/conferences
Datasets
- Žitnik., S. and Knez, T. (2025). Lexical LLM Pretraining Corpus. [Data set]. D1.1.1 – pretraining corpus
- Verdonik, D. (2025). Manually transcribed conversation data for the spoken learning corpus (5 hours) [Data set]. ROG-dialog-trs-v01.rar
Other
- Klemen, M. (2025). Advanced grammatical analysis of multilingual corpora. Zenodo. https://doi.org/10.5281/zenodo.15646857
- Žitnik, S. and Knez, T. (2025). Improving Linguistic Data with LLMs. Zenodo. https://doi.org/10.5281/zenodo.15878672
- Arhar Holdt, Š., and Jelovčan, G. (2025). Behind the scenes of developing spell and grammar correction LLM for Slovenian: combining authentic and synthetic data. Zenodo. https://doi.org/10.5281/zenodo.15282208