Results

List of Results by Projects and Months

*No. Name Organization Type TRL Links
August 2024  
     D1.1 Accessible Slovenian training dataset for dialogues and command requests. FRI data 3 Access: GaMS-Instruct-GEN 1.0 | CLARIN.SI
D1.2 Large language corpus for conversational language and addressed terminological areas – first version. FRI data 3 Access: PoVeJMo-VeMo-Med 1.0 | CLARIN.SI
D1.3 Validation corpus for large language models. FRI data 3 Access: slovenian-llm-eval | HuggingFace
D2.1 Open-access large generative language model tailored for dialogues and commands with a size of one billion parameters. FRI software 3 Access: GaMS-2B-Instruct | HuggingFace
D3.1 Training set with at least 10,000 examples. Semantika data 5 Access: GaMS-Instruct-DH 1.0 | CLARIN.SI
D4.1 Training set with specific dialogues and commands from the field of medical applications, consisting of at least 10,000 examples. Better data 5 Access: GaMS-Instruct-MED 1.0 | CLARIN.SI
D5.1 Analysis of the possibilities of using speech and language technologies to improve the efficiency of human-machine communication in industrial environments. Špica documentation 5 Access: Analiza možnosti uporabe govornih in jezikovnih tehnologij | PDF (in Slovene)
D5.2 Report on the suitability of technical equipment and possible methods of integrating language and speech technologies in manufacturing environments. Špica documentation 5 Access: Poročilo o primernosti tehnične opreme in možnih načinih integracije jezikovnih in govornih tehnologij v proizvodnih okoljih | PDF (in Slovene)
D6.1 Development of the initial plan and approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of possible approaches. XLAB documentation 5 Access: Dokumentacija in načrt rabe podatkov in pristopov | PDF (in Slovene)
D6.2 First version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations. XLAB software 5 Access: VeMo-IaC – v1 (opis) | PDF
February 2025  
D1.4 Tools for preparing lexical databases for model training and components for the HuggingFace pipeline for integrating open lexical forms. FRI software 3 Description of API routes for Digital dictionary database of Slovene:

Digital Dictionary Database of Slovene – API Routes | CJVT Wiki

API-routes test:  Redoc

Swagger

 

D1.5 Dedicated tokenizers for the Slovenian language – first version. FRI software 3 Access:
GaMS-1B | HuggingFace (The tokenizer is included in the GAMS-1B model, however, it can be downloaded independently; see tokenizer.jsontokenizer_config.json and special_tokens_map.json)
D2.2 Open-access large generative language model tailored for dialogues and commands with a size of 10 billion parameters. FRI software 3 Access: GaMS-27B-Instruct | HuggingFace and GaMS-27B-Instruct | HuggingFace
D3.2 Calibrated SloLLaMai models for the humanities and instruction tracking. Semantika software 5 Access: Micka-gen3 | HuggingFace
D4.2 Large generative language model adapted for the field of medicine. Better software 5 Access: GaMS collection| HuggingFace (GaMS-2BGaMS-9BGaMS-27B – models pretrained on medical texts corpus (among other corpora) PoVeJMo-VeMo-Med 1.0)
August 2025  
D1.6 Large language corpus for conversational language and addressed terminological areas – second version. FRI data 4 Access:

GaMS-Instruct-MED 2.0 | CLARIN.SI

D2.3 Open-access computationally lightweight generative language model tailored for dialogues and commands. FRI software 4 Access: GaMS-2B-Instruct | HuggingFace (GaMS model version which can be run on a basic graphics card on a standard computer)

Access: GaMS-2B-WebGPU | HuggingFace (GaMS model version, optimized for running online)

D3.3 Demonstration application for OCR. Semantika software 6 Access: GalisOnline | Description (PDF in Slovene)
D3.4 Demonstration application for semantic search. Semantika software 6 Access: Demonstracijska aplikacija za OCR | Description (PDF in Slovene)
D4.3 Precise recognizer of Slovenian speech specialized for the field of medicine. Better software 5  
D5.3 Online service for acoustic preprocessing of audio signals and noise reduction. Špica software 5 Description: Spletni servis za akustično predprocesiranje zvočnega signala in odpravo šuma | Description (PDF in Slovene)ASR Performance Evaluation Post Speech Enhancement: WER Analysis and Tuning | Attachment 1 (PDF)Evaluation of Custom Masking-Based Speech Enhancement and Pretrained NVIDIA BNR Model for Slovenian Language Applications | Attachmet 2 (PDF)

Access to the online service:

Online services

API-routes test:

Swagger

Redoc

 

D5.4 Accurate and robust multilingual speech recognition model for South Slavic languages. Špica software 5 Description: Natančen in robusten večjezični model razpoznave govora za južnoslovanske jezike | Description (PDF in Slovene)

For access contact info@vitasis.si.

D6.3 Refinement of the plan and corrections to approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of approaches in construction. XLAB documentation 5  Description: Dokumentacija in načrt rabe podatkov in pristopov | Description (PDF in Slovene)
D6.4 Second version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL5. XLAB software 5  Description: Druga različica programske opreme | Description (PDF in Slovene)
February 2026  
D1.7 Dedicated tokenizers for the Slovenian language – final version. FRI software 4  
D2.4 Open-access large language model with embedded additional knowledge. FRI software 3  
D3.5 Demonstration application for the automatic generation of collection descriptions. Semantika software 6  
D3.6 Demonstration application for summarizer. Semantika software 6  
D3.7 Demonstration application for a translator. Semantika software 6  
D3.8 Demonstration application for translation between instructions in natural language and command language. Semantika software 6  
D3.9 Development of an application for machine entity extraction and document anonymization, along with a demonstration on databases aquired by Semantika. Semantika software 5  
D4.4 Medical application using a speech recognizer and a large generative language model. Better software 6  
June 2026  
D1.8 Large language corpus for conversational language and addressed terminological areas – final version. FRI data 4  
D1.9 Knowledge base created based on the Digital Lexical Database. FRI data 3  
D3.10 Demonstration of an upgraded digital guide. Semantika software 6  
D5.5 Prototype integration of a speech communication system with a selected business process management solution in manufacturing. Špica software 6  
D6.5 Final version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL6. XLAB software 6