PUBLIKACIJE
Ta seznam vključuje vse publikacije, ki so nastale v okviru projekta LLM4DH.
Poglavja v knjigah
Članki v revijah
- Petrič, T., Arhar Holdt, Špela, and Robnik-Šikonja, M. (2024). Pomembnost realistične evalvacije: Primer popravkov sklona in števila v slovenščini z velikim jezikovnim modelom. Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, 12(1), 106-130. https://doi.org/10.4312/slo2.0.2024.1.106-130
- Kuzman, T. and Ljubešić, N. (2025). LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. IEEE Xplore, 13, 35621-35633. doi: 10.1109/ACCESS.2025.3544814
- Mochtak, M., Rupnik, P., Kuzman, T. and Ljubešić, N. (2025). Parlasent: mapping sentiment in political discourse with large language models. Political Research Exchange, 7(1). https://doi.org/10.1080/2474736X.2025.2508377
- Yadav, A., Garg, T., Klemen, M., Ulčar, M., Agarwal, B., and Robnik Šikonja, M. (2025) From translation to generative LLMs : classification of code-mixed affective tasks. IEEE transactions on affective computing., 1949-3045. 10.1109/TAFFC.2025.3553399
- Hostnik, M. and Robnik Šikonja, M. (2025). Retrieval-augmented code completion for local projects using large language models, Expert Systems with Applications, 292,
- Ulčar, M., Žagar, A., Armendariz, C.S., Repar, A., Pollak, S., Purver, M., and Robnik Šikonja, M. (2026). Mono- and cross-lingual evaluation of representation language models on less-resourced languages, Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852
Predstavitve na konferencah
- Krek. S. (2025, March 24-29 ). GRAVITACIJA – Veliki jezikovni modeli za digitalno humanistiko [Conference presentation]. Jožef Stefan days, Ljubljana, Slovenia. https://dnevi.ijs.si/#DOV
- Robnik Šikonja, M. (2025, April 16-17). Veliki jezikovni modeli za slovenščino in prevajanje [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
- Arhar Holdt, Š. (2025, April 16-17). Lektoriranje v času umetne inteligence: Kdo bo postavljal piko na UI? [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
- Kuzman, T. (2025, April 16-17). Prednosti in tveganja uporabe ChatGPTja za prevajalce [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
- Arčon, T. (2025, June 4). Lost in instructions? Developing a systematic approach to instruction tuning datasets for LLMs [Conference presentation]. Picacsa 2025. Ljubljana, Slovenia.
- Jelovčan, G. (2025, June 4). Grammar Error Correction Dataset for Slovene Language [Conference presentation]. Picacsa 2025. Ljubljana, Slovenia.
- Vreš, D. (2025, June 4). GaMS-9B: Pushing the Boundaries of Slovenian Large Language Models [Conference presentation]. Picacsa 2025. Ljubljana, Slovenia.
- Robnik, Šikonja, M. (2025, June 10). Projekt PoVeJMo, Gravitacija in ERA Chair projekt AI4DH [Conference presentation]. 4. Nacionalna konferenca Umetna inteligenca – nove smeri razvoja in izzivi za Slovenijo. Mengeš, Slovenia. https://dogodki.vlada.si/umetna-inteligenca-digitalna-preobrazba-prijava
- Robnik Šikonja, M. (2025, June 13). Large Language Models for Analysis of Complex Phenomena [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Arčon, T, Robnik Šikonja, M. and Tratnik, P. (2025, June 13). Motif Detection Using Large Language Models: The Cinderella Case Study [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Horvat, M., Koražija, J. and Tratnik, P. (2025, June 13). Modeling Deliberative Values in Narrative Culture Using LLMs [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Babnik, J. and Tratnik, P. (2025, June 13) The Dragon-Slayer’s Narrative: Structural Kinship and Discursive Divergence [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Babnik, J. and Martinc, M. (2025, June 13) Considering Modes: Semiotics and Multimodal AI [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
- Robnik, Šikonja, M. (2025, June 17). The importance of language data for the development of LT solutions – future steps [Conference presentation]. EU LDS Country Workshop. Ljubljana, Slovenia. https://language-data-space.ec.europa.eu/events/lds-country-workshop-slovenia-2025-06-17_en
- Kosem, I. (2025, July 2-5). Implementing AI in lexicographic workflow: challenges and opportunities [Conference presentation]. 29th International Conference of the African Association for Lexicography. https://www.afrilex.co.za/conferences
- Hüll, N. and Dobrovoljc, K. (2025). Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders. Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025). Ljubljana, Slovenia.
- Terčon, L. and Dobrovoljc, K. (2025). ComparaTree: A Multi-Level Comparative Treebank Analysis Tool. Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025). Ljubljana, Slovenia.
- Krsnik, L. and Dobrovoljc, K. (2025). STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis. Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025). Ljubljana, Slovenia.
- Munda, T. and Arhar Holdt, Š. (2025). First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0. Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025). Ljubljana, Slovenia.
- Vintar., Š. and Javoršek, J. J. (2025). The truth is no diaper: Human and AI-generated associations to emotional words. 16th International Conference on Computational Creativity, ICCC’25.
- Gorjanc, V., Pretnar Žagar, A., Dobranić, F., and Fišer, D. (2025). Accessing Historical Periodicals: Newspaper Discourse on Slovene Language. ADHO Digital Humanities Conference 2025, DH2025. https://doi.org/10.5281/zenodo.16087978
Zbirke podatkov
- Žitnik., S. and Knez, T. (2025). Lexical LLM Pretraining Corpus. [Data set]. D1.1.1 – pretraining corpus
- Verdonik, D. (2025). Manually transcribed conversation data for the spoken learning corpus (5 hours) [Data set]. ROG-dialog-trs-v01.rar
- Škvorc, T. and Robnik Šikonja, M. (2025). Word-sense disambiguation corpus SloDicWSD 1.0, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/2008.
- Žagar, A., Dobrovoljc, K., Munda, T., Brglez, M., and Robnik Šikonja, M. (2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1988.
- Vreš, D., Arčon, T., Čibej, J., Robnik Šikonja, M., Krek, S., Gabrovšek, D., Ježovnik, J., Kastelic, M., Kevina, D., Ledinek, N., Michelizza, M., Perdih, A., Petric, Š., and Trojar, M. Slovene instruction-following dataset for large language models GaMS-Instruct-GEN 1.0. Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1971
Drugo
- Klemen, M. (2025). Advanced grammatical analysis of multilingual corpora. Zenodo. https://doi.org/10.5281/zenodo.15646857
- Žitnik, S. and Knez, T. (2025). Improving Linguistic Data with LLMs. Zenodo. https://doi.org/10.5281/zenodo.15878672
- Arhar Holdt, Š., and Jelovčan, G. (2025). Behind the scenes of developing spell and grammar correction LLM for Slovenian: combining authentic and synthetic data. Zenodo. https://doi.org/10.5281/zenodo.15282208

