Improving Linguistic Data with LLMs

We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.

Advanced grammatical analysis of multilingual corpora

As part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.

Workshop: AI Methods for Research of Folkloristic Narratives

On June 13, 2025, we are organising a workshop at UL FRI. The topic of the workshop is AI Methods for Research of Folkloristic Narratives.

CLASSLA-Express Workshops in 2025

CLASSLA-Express workshops aim to show participants how to use the CLASSLA web corpora in language research. The workshops comprise hands-on exercises showing how to create queries in corpora for Bulgarian, Croatian, Macedonian, Serbian and Slovene.

Behind the scenes of developing spell and grammar correction LLM for Slovenian: combining authentic and synthetic data

This article briefly explains the methods and prompts used to develop spelling and grammar correction LLMs for the Slovenian language.