Results

List of Results by Projects and Months

*No. Name Organization Type TRL Links

August 2024

 
     D1.1 Accessible Slovenian training dataset for dialogues and command requests. FRI data 3 Access:

CLARIN.SI

D1.2 Large language corpus for conversational language and addressed terminological areas – first version. FRI data 3 Access:

CLARIN.SI

D1.3 Validation corpus for large language models. FRI data 3 Access:

HuggingFace

D2.1 Open-access large generative language model tailored for dialogues and commands with a size of one billion parameters. FRI software 3 Access:

HuggingFace

D3.1 Training set with at least 10,000 examples. Semantika data 5 Access:

CLARIN.SI

D4.1 Training set with specific dialogues and commands from the field of medical applications, consisting of at least 10,000 examples. Better data (description) 5 Access:

CLARIN.SI

D5.1 Analysis of the possibilities of using speech and language technologies to improve the efficiency of human-machine communication in industrial environments. Špica documentation (description) 5 Access:

PDF file

(in Slovene)

D5.2 Report on the suitability of technical equipment and possible methods of integrating language and speech technologies in manufacturing environments. Špica documentation (description) 5 Access:

PDF file

(in Slovene)

D6.1 Development of the initial plan and approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of possible approaches. XLAB documentation (description) 5 Access:

PDF file

(in Slovene)

D6.2 First version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations. XLAB software (description) 5 Access:

PDF file

(in Slovene)

February 2025

 
D1.4 Tools for preparing lexical databases for model training and components for the HuggingFace pipeline for integrating open lexical forms. FRI software 3
D1.5 Dedicated tokenizers for the Slovenian language – first version. FRI software 3
D2.2 Open-access large generative language model tailored for dialogues and commands with a size of 10 billion parameters. FRI software 3
D3.2 Calibrated SloLLaMai models for the humanities and instruction tracking. Semantika software 5
D4.2 Large generative language model adapted for the field of medicine. Better software 5

August 2025

 
D1.6 Large language corpus for conversational language and addressed terminological areas – second version. FRI data 4
D2.3 Open-access computationally lightweight generative language model tailored for dialogues and commands. FRI software 4
D3.3 Demonstration application for OCR. Semantika software 6
D3.4 Demonstration application for semantic search. Semantika software 6
D4.3 Precise recognizer of Slovenian speech specialized for the field of medicine. Better software 5
D5.3 Online service for acoustic preprocessing of audio signals and noise reduction. Špica software 5
D5.4 Accurate and robust multilingual speech recognition model for South Slavic languages. Špica software 5
D6.3 Refinement of the plan and corrections to approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of approaches in construction. XLAB documentation 5
D6.4 Second version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL5. XLAB software 5

February 2026

 
D1.7 Dedicated tokenizers for the Slovenian language – final version. FRI software 4
D2.4 Open-access large language model with embedded additional knowledge. FRI software 3
D3.5 Demonstration application for the automatic generation of collection descriptions. Semantika software 6
D3.6 Demonstration application for summarizer. Semantika software 6
D3.7 Demonstration application for a translator. Semantika software 6
D3.8 Demonstration application for translation between instructions in natural language and command language. Semantika software 6
D3.9 Development of an application for machine entity extraction and document anonymization, along with a demonstration on databases aquired by Semantika. Semantika software 5
D4.4 Medical application using a speech recognizer and a large generative language model. Better software 6

June 2026

 
D1.8 Large language corpus for conversational language and addressed terminological areas – final version. FRI data 4
D1.9 Knowledge base created based on the Digital Lexical Database. FRI data 3
D3.10 Demonstration of an upgraded digital guide. Semantika software 6
D5.5 Prototype integration of a speech communication system with a selected business process management solution in manufacturing. Špica software 6
D6.5 Final version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL6. XLAB software 6