Deliverables

List of deliverables by work package and month

Task number Deliverable Type Deliverable link
January 2025
3.1.1 Online interface for collecting conversational speech data (M4). Application Access:

Interface

7.3.1 Dissemination and Communication Plan (M4). Report
March 2025
1.1.1 DDDS and OSWN datasets ready for training (M6). Dataset
7.2.1 Data Management Plan (M6). Report Access:

PDF-file

April 2025
3.1.2 Manually transcribed conversation data for the spoken learning corpus (5 hours) (M7). Dataset
7.2.2 Code of ethics,  risk monitoring activities (M1-M36). Report
July 2025
3.2.1 Expanded learning spoken corpus with dialogue act and sentiment annotations (min. 5 hours of conversational speech) (M10). Dataset
September 2025
1.1.2 Initial improved LLM (M12). Model
1.3.1 Slovene datasets for training VLM (M12). Dataset
2.2.1 Synthetic language error datasets (M12). Dataset
2.3.1 LLM with improved grammatical knowledge (M12). Model
4.1.1 Interaction graphs of historical named entities (M12). Other
6.1.1 Metaphor, irony, and sarcasm benchmark in Slovene (M12). Dataset
7.1.1 Annual reports (M12). Report
March 2026
1.2.1 KGs and raw texts datasets (M18). Dataset
2.2.2 Grammar checking LLMs (M18). Model
2.3.2 Dataset for evaluating grammatical knowledge of LLMs (M18). Dataset
4.4.1 A new RAG system for Slovenian capable of detecting contradictions in documents (M18). Application
5.1.1 Novel methodological approaches to historical and ideological analysis using LLMs (M18). Report
5.2.1 Novel methodology for digital folkloristics (M18). Report
5.3.1 Database of Slovene legal texts (M18). Database
September 2026
1.1.3 Final improved LLM (M24). Model
1.2.2 Initial improved LLMs (M24). Model
1.3.2 Slovene VLM model (M24). Model
2.1.1 DDDS with generated lexicographic data – first version (M24). Dataset
2.2.3 Authentic grammar checking evaluation datasets (M24). Dataset
4.1.2 Visualization of extracted named entity graphs (M24). Report
4.2.1 Novel methodology for diachronic analysis using LLMs (M24). Report
4.3.1 Dataset of images from Slovene historical periodicals (M24). Dataset
6.1.2 Pragmatic and associative behavior explanation benchmark (M24). Dataset
6.3.1 Bias detection datasets for Slovene (M24). Dataset
7.1.2 Annual reports (M24). Report
October 2026
3.1.3 Manually multi-reference-transcribed data for the spoken benchmark corpus (1 hour new data + 3 hours of existing ASR data) (M25). Dataset
March 2027
1.2.3 Final improved LLMs (M30). Model
3.2.2 Models for dialogue act and sentiment identification in Slovenian speech (M30). Model
4.3.2 VLM adapted for selected DH tasks (M30). Model
6.2.1 Speech dataset (4 hours), annotated with dialogue act and sentiment annotations (M30). Dataset
6.3.2 Debiasing approach for LLMs (M30). Report
September 2027
2.1.2 DDDS with generated lexicographic data – final version (M36). Other
2.3.3 Multilingual and cross-lingual grammatical analyses (M36). Report
3.1.4   Audio speech database of conversation data (M36). Database
3.3.1 Slovenian audio speech database from publicly available resources (min. 300 hours) (M36). Database
3.4.1 A new ASR-LLM integration method for domain-specific ASR for low-resource languages (M36). Program
5.1.2 Novel analyses of ideological concepts through history (M36). Report
5.2.2 Novel analyses of conflict resolution rituals (M36). Report
5.3.2 An RAG-based system for Slovene legal support (M36). Other
6.2.2 Multi-reference ASR task, dialogue processing task, and sentiment in speech tasks (M36). Report
6.3.3 Spoken language bias detection analysis (M36). Report
6.4.1 A novel knowledge-based explanation methodology for LLM explanation (M36). Report
7.1.3 Annual reports (M36). Report
7.3.2 At least 40 conference/submitted journal publications (M36). Report
7.3.3 The above-mentioned activities (M36). Other