Results
List of Results by Projects and Months
| *No. | Name | Organization | Type | TRL | Links |
| August 2024 | |||||
| D1.1 | Accessible Slovenian training dataset for dialogues and command requests. | FRI | data | 3 | Access: GaMS-Instruct-GEN 1.0 | CLARIN.SI |
| D1.2 | Large language corpus for conversational language and addressed terminological areas – first version. | FRI | data | 3 | Access: PoVeJMo-VeMo-Med 1.0 | CLARIN.SI |
| D1.3 | Validation corpus for large language models. | FRI | data | 3 | Access: slovenian-llm-eval | HuggingFace |
| D2.1 | Open-access large generative language model tailored for dialogues and commands with a size of one billion parameters. | FRI | software | 3 | Access: GaMS-2B-Instruct | HuggingFace |
| D3.1 | Training set with at least 10,000 examples. | Semantika | data | 5 | Access: GaMS-Instruct-DH 1.0 | CLARIN.SI |
| D4.1 | Training set with specific dialogues and commands from the field of medical applications, consisting of at least 10,000 examples. | Better | data | 5 | Access: GaMS-Instruct-MED 1.0 | CLARIN.SI |
| D5.1 | Analysis of the possibilities of using speech and language technologies to improve the efficiency of human-machine communication in industrial environments. | Špica | documentation | 5 | Access: Analiza možnosti uporabe govornih in jezikovnih tehnologij | PDF (in Slovene) |
| D5.2 | Report on the suitability of technical equipment and possible methods of integrating language and speech technologies in manufacturing environments. | Špica | documentation | 5 | Access: Poročilo o primernosti tehnične opreme in možnih načinih integracije jezikovnih in govornih tehnologij v proizvodnih okoljih | PDF (in Slovene) |
| D6.1 | Development of the initial plan and approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of possible approaches. | XLAB | documentation | 5 | Access: Dokumentacija in načrt rabe podatkov in pristopov | PDF (in Slovene) |
| D6.2 | First version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations. | XLAB | software | 5 | Access: VeMo-IaC – v1 (opis) | PDF |
| February 2025 | |||||
| D1.4 | Tools for preparing lexical databases for model training and components for the HuggingFace pipeline for integrating open lexical forms. | FRI | software | 3 | Description of API routes for Digital dictionary database of Slovene:
Digital Dictionary Database of Slovene – API Routes | CJVT Wiki API-routes test: Redoc
|
| D1.5 | Dedicated tokenizers for the Slovenian language – first version. | FRI | software | 3 | Access: GaMS-1B | HuggingFace (The tokenizer is included in the GAMS-1B model, however, it can be downloaded independently; see tokenizer.json, tokenizer_config.json and special_tokens_map.json) |
| D2.2 | Open-access large generative language model tailored for dialogues and commands with a size of 10 billion parameters. | FRI | software | 3 | Access: GaMS-27B-Instruct | HuggingFace and GaMS-27B-Instruct | HuggingFace |
| D3.2 | Calibrated SloLLaMai models for the humanities and instruction tracking. | Semantika | software | 5 | Access: Micka-gen3 | HuggingFace |
| D4.2 | Large generative language model adapted for the field of medicine. | Better | software | 5 | Access: GaMS collection| HuggingFace (GaMS-2B, GaMS-9B, GaMS-27B – models pretrained on medical texts corpus (among other corpora) PoVeJMo-VeMo-Med 1.0) |
| August 2025 | |||||
| D1.6 | Large language corpus for conversational language and addressed terminological areas – second version. | FRI | data | 4 | Access: |
| D2.3 | Open-access computationally lightweight generative language model tailored for dialogues and commands. | FRI | software | 4 | Access: GaMS-2B-Instruct | HuggingFace (GaMS model version which can be run on a basic graphics card on a standard computer)
Access: GaMS-2B-WebGPU | HuggingFace (GaMS model version, optimized for running online) |
| D3.3 | Demonstration application for OCR. | Semantika | software | 6 | Access: GalisOnline | Description (PDF in Slovene) |
| D3.4 | Demonstration application for semantic search. | Semantika | software | 6 | Access: Demonstracijska aplikacija za OCR | Description (PDF in Slovene) |
| D4.3 | Precise recognizer of Slovenian speech specialized for the field of medicine. | Better | software | 5 | |
| D5.3 | Online service for acoustic preprocessing of audio signals and noise reduction. | Špica | software | 5 | Description: Spletni servis za akustično predprocesiranje zvočnega signala in odpravo šuma | Description (PDF in Slovene), ASR Performance Evaluation Post Speech Enhancement: WER Analysis and Tuning | Attachment 1 (PDF), Evaluation of Custom Masking-Based Speech Enhancement and Pretrained NVIDIA BNR Model for Slovenian Language Applications | Attachmet 2 (PDF)
Access to the online service: API-routes test:
|
| D5.4 | Accurate and robust multilingual speech recognition model for South Slavic languages. | Špica | software | 5 | Description: Natančen in robusten večjezični model razpoznave govora za južnoslovanske jezike | Description (PDF in Slovene)
For access contact info@vitasis.si. |
| D6.3 | Refinement of the plan and corrections to approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of approaches in construction. | XLAB | documentation | 5 | Description: Dokumentacija in načrt rabe podatkov in pristopov | Description (PDF in Slovene) |
| D6.4 | Second version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL5. | XLAB | software | 5 | Description: Druga različica programske opreme | Description (PDF in Slovene) |
| February 2026 | |||||
| D1.7 | Dedicated tokenizers for the Slovenian language – final version. | FRI | software | 4 | |
| D2.4 | Open-access large language model with embedded additional knowledge. | FRI | software | 3 | |
| D3.5 | Demonstration application for the automatic generation of collection descriptions. | Semantika | software | 6 | |
| D3.6 | Demonstration application for summarizer. | Semantika | software | 6 | |
| D3.7 | Demonstration application for a translator. | Semantika | software | 6 | |
| D3.8 | Demonstration application for translation between instructions in natural language and command language. | Semantika | software | 6 | |
| D3.9 | Development of an application for machine entity extraction and document anonymization, along with a demonstration on databases aquired by Semantika. | Semantika | software | 5 | |
| D4.4 | Medical application using a speech recognizer and a large generative language model. | Better | software | 6 | |
| June 2026 | |||||
| D1.8 | Large language corpus for conversational language and addressed terminological areas – final version. | FRI | data | 4 | |
| D1.9 | Knowledge base created based on the Digital Lexical Database. | FRI | data | 3 | |
| D3.10 | Demonstration of an upgraded digital guide. | Semantika | software | 6 | |
| D5.5 | Prototype integration of a speech communication system with a selected business process management solution in manufacturing. | Špica | software | 6 | |
| D6.5 | Final version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL6. | XLAB | software | 6 | |