Improving Linguistic Data with LLMs
We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.
This author has yet to write their bio.Meanwhile lets just say that we are proud saras contributed a whooping 13 entries.
We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.
As part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.
On June 13, 2025, we are organising a workshop at UL FRI. The topic of the workshop is AI Methods for Research of Folkloristic Narratives.
CLASSLA-Express workshops aim to show participants how to use the CLASSLA web corpora in language research. The workshops comprise hands-on exercises showing how to create queries in corpora for Bulgarian, Croatian, Macedonian, Serbian and Slovene.
This article briefly explains the methods and prompts used to develop spelling and grammar correction LLMs for the Slovenian language.
Content:
marko.robniksikonja@fri.uni-lj.si
Duration of the project:
September 2024 – September 2027
Faculty of Computer and Information Science
Večna pot 113, SI-1000 Ljubljana, Slovenia
Room: R2.06 (2nd floor)