Results

List of Results by Projects and Months

*No.	Name	Organization	Type	TRL	Links
August 2024
D1.1	Accessible Slovenian training dataset for dialogues and command requests.	FRI	data	3	Access: GaMS-Instruct-GEN 1.0 \| CLARIN.SI
D1.2	Large language corpus for conversational language and addressed terminological areas – first version.	FRI	data	3	Access: PoVeJMo-VeMo-Med 1.0 \| CLARIN.SI
D1.3	Validation corpus for large language models.	FRI	data	3	Access: slovenian-llm-eval \| HuggingFace
D2.1	Open-access large generative language model tailored for dialogues and commands with a size of one billion parameters.	FRI	software	3	Access: GaMS-2B-Instruct \| HuggingFace
D3.1	Training set with at least 10,000 examples.	Semantika	data	5	Access: GaMS-Instruct-DH 1.0 \| CLARIN.SI
D4.1	Training set with specific dialogues and commands from the field of medical applications, consisting of at least 10,000 examples.	Better	data	5	Access: GaMS-Instruct-MED 1.0 \| CLARIN.SI
D5.1	Analysis of the possibilities of using speech and language technologies to improve the efficiency of human-machine communication in industrial environments.	Špica	documentation	5	Access: Analiza možnosti uporabe govornih in jezikovnih tehnologij \| PDF (in Slovene)
D5.2	Report on the suitability of technical equipment and possible methods of integrating language and speech technologies in manufacturing environments.	Špica	documentation	5	Access: Poročilo o primernosti tehnične opreme in možnih načinih integracije jezikovnih in govornih tehnologij v proizvodnih okoljih \| PDF (in Slovene)
D6.1	Development of the initial plan and approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of possible approaches.	XLAB	documentation	5	Access: Dokumentacija in načrt rabe podatkov in pristopov \| PDF (in Slovene)
D6.2	First version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations.	XLAB	software	5	Access: VeMo-IaC – v1 (opis) \| PDF
February 2025
D1.4	Tools for preparing lexical databases for model training and components for the HuggingFace pipeline for integrating open lexical forms.	FRI	software	3	Description of API routes for Digital dictionary database of Slovene: Digital Dictionary Database of Slovene – API Routes \| CJVT Wiki API-routes test: Redoc Swagger
D1.5	Dedicated tokenizers for the Slovenian language – first version.	FRI	software	3	Access: GaMS-1B \| HuggingFace (The tokenizer is included in the GAMS-1B model, however, it can be downloaded independently; see tokenizer.json, tokenizer_config.json and special_tokens_map.json)
D2.2	Open-access large generative language model tailored for dialogues and commands with a size of 10 billion parameters.	FRI	software	3	Access: GaMS-27B-Instruct \| HuggingFace and GaMS-27B-Instruct \| HuggingFace
D3.2	Calibrated SloLLaMai models for the humanities and instruction tracking.	Semantika	software	5	Access: Micka-gen3 \| HuggingFace
D4.2	Large generative language model adapted for the field of medicine.	Better	software	5	Access: GaMS collection\| HuggingFace (GaMS-2B, GaMS-9B, GaMS-27B – models pretrained on medical texts corpus (among other corpora) PoVeJMo-VeMo-Med 1.0)
August 2025
D1.6	Large language corpus for conversational language and addressed terminological areas – second version.	FRI	data	4	Access: GaMS-Instruct-MED 2.0 \| CLARIN.SI
D2.3	Open-access computationally lightweight generative language model tailored for dialogues and commands.	FRI	software	4	Access: GaMS-2B-Instruct \| HuggingFace (GaMS model version which can be run on a basic graphics card on a standard computer) Access: GaMS-2B-WebGPU \| HuggingFace (GaMS model version, optimized for running online)
D3.3	Demonstration application for OCR.	Semantika	software	6	Access: GalisOnline \| Description (PDF in Slovene)
D3.4	Demonstration application for semantic search.	Semantika	software	6	Access: Demonstracijska aplikacija za OCR \| Description (PDF in Slovene)
D4.3	Precise recognizer of Slovenian speech specialized for the field of medicine.	Better	software	5
D5.3	Online service for acoustic preprocessing of audio signals and noise reduction.	Špica	software	5	Description: Spletni servis za akustično predprocesiranje zvočnega signala in odpravo šuma \| Description (PDF in Slovene), ASR Performance Evaluation Post Speech Enhancement: WER Analysis and Tuning \| Attachment 1 (PDF), Evaluation of Custom Masking-Based Speech Enhancement and Pretrained NVIDIA BNR Model for Slovenian Language Applications \| Attachmet 2 (PDF) Access to the online service: Online services API-routes test: Swagger Redoc
D5.4	Accurate and robust multilingual speech recognition model for South Slavic languages.	Špica	software	5	Description: Natančen in robusten večjezični model razpoznave govora za južnoslovanske jezike \| Description (PDF in Slovene) For access contact info@vitasis.si.
D6.3	Refinement of the plan and corrections to approaches for the effective use of large language models in IaC (Infrastructure as Code): Use and selection of training data, description of approaches in construction.	XLAB	documentation	5	Description: Dokumentacija in načrt rabe podatkov in pristopov \| Description (PDF in Slovene)
D6.4	Second version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL5.	XLAB	software	5	Description: Druga različica programske opreme \| Description (PDF in Slovene)
February 2026
D1.7	Dedicated tokenizers for the Slovenian language – final version.	FRI	software	4
D2.4	Open-access large language model with embedded additional knowledge.	FRI	software	3
D3.5	Demonstration application for the automatic generation of collection descriptions.	Semantika	software	6
D3.6	Demonstration application for summarizer.	Semantika	software	6
D3.7	Demonstration application for a translator.	Semantika	software	6
D3.8	Demonstration application for translation between instructions in natural language and command language.	Semantika	software	6
D3.9	Development of an application for machine entity extraction and document anonymization, along with a demonstration on databases aquired by Semantika.	Semantika	software	5
D4.4	Medical application using a speech recognizer and a large generative language model.	Better	software	6
June 2026
D1.8	Large language corpus for conversational language and addressed terminological areas – final version.	FRI	data	4
D1.9	Knowledge base created based on the Digital Lexical Database.	FRI	data	3
D3.10	Demonstration of an upgraded digital guide.	Semantika	software	6
D5.5	Prototype integration of a speech communication system with a selected business process management solution in manufacturing.	Špica	software	6
D6.5	Final version of software enabling the use of language technologies in IaC, taking into account performance, robustness, and security limitations – achieved TRL6.	XLAB	software	6

Results

List of Results by Projects and Months

CONTACT

LOCATION

INFO

MENU