Publications

This list includes all the publications which were created in the context of the LLM4DH project.

Book chapters

Journal articles

  • Petrič, T., Arhar Holdt, Špela, and Robnik-Šikonja, M. (2024). Pomembnost realistične evalvacije: Primer popravkov sklona in števila v slovenščini z velikim jezikovnim modelom. Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, 12(1), 106-130. https://doi.org/10.4312/slo2.0.2024.1.106-130
  • Kuzman, T. and Ljubešić, N. (2025). LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. IEEE Xplore, 13, 35621-35633. doi: 10.1109/ACCESS.2025.3544814
  • Mochtak, M., Rupnik, P., Kuzman, T. and Ljubešić, N. (2025). Parlasent: mapping sentiment in political discourse with large language models. Political Research Exchange, 7(1). https://doi.org/10.1080/2474736X.2025.2508377
  • Yadav, A., Garg, T., Klemen, M., Ulčar, M., Agarwal, B., and Robnik Šikonja, M. (2025) From translation to generative LLMs : classification of code-mixed affective tasks. IEEE transactions on affective computing., 1949-3045. 10.1109/TAFFC.2025.3553399
  • Hostnik, M. and Robnik Šikonja, M. (2025). Retrieval-augmented code completion for local projects using large language models, Expert Systems with Applications, 292, 128596. https://doi.org/10.1016/j.eswa.2025.128596
  • Ulčar, M., Žagar, A., Armendariz, C.S., Repar, A., Pollak, S., Purver, M., and Robnik Šikonja, M. (2026). Mono- and cross-lingual evaluation of representation language models on less-resourced languages, Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852

Conference papers

Datasets

  • Žitnik., S. and Knez, T. (2025). Lexical LLM Pretraining Corpus. [Data set]. D1.1.1 – pretraining corpus
  • Verdonik, D. (2025). Manually transcribed conversation data for the spoken learning corpus (5 hours) [Data set]. ROG-dialog-trs-v01.rar
  • Škvorc, T. and Robnik Šikonja, M. (2025). Word-sense disambiguation corpus SloDicWSD 1.0, 
  • Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/2008.
  • Žagar, A., Dobrovoljc, K., Munda, T., Brglez, M., and Robnik Šikonja, M. (2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1988
  • Vreš, D., Arčon, T., Čibej, J., Robnik Šikonja, M., Krek, S., Gabrovšek, D., Ježovnik, J., Kastelic, M., Kevina, D., Ledinek, N., Michelizza, M., Perdih, A., Petric, Š., and Trojar, M. Slovene instruction-following dataset for large language models GaMS-Instruct-GEN 1.0. Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1971

Other