Deliverables - LLM4DH

LIST OF DELIVERABLES BY WORK PACKAGES AND MONTHS

Task number	Deliverable	Type	Deliverable link
January 2025
3.1.1	Online interface for collecting conversational speech data (M4).	Application	Access: Interface
7.3.1	Dissemination and Communication Plan (M4).	Report	Access: PDF-file
March 2025
1.1.1	DDDS and OSWN datasets ready for training (M6)	Dataset	Access: Link
7.2.1	Data Management Plan (M6).	Report	Acces: PDF-file
April 2025
3.1.2	Manually transcribed conversation data for the spoken learning corpus (5 hours) (M7).	Dataset	Access: Link
7.2.2	Code of ethics, risk monitoring activities (M1-M36)	Report	Access: PDF-file
July 2025
3.2.1	Expanded learning spoken corpus with dialogue act and sentiment annotations (min. 5 hours of conversational speech) (M10)	Dataset	Access: Link
September 2025
1.1.2	Initial improved LLM (M12).	Model	Access: Link
1.3.1	Slovene datasets for training VLM (M12).	Dataset	Access: Link
2.2.1	Synthetic language error datasets (M12).	Dataset	Access: Link
2.3.1	LLM with improved grammatical knowledge (M12)	Model	Access: Link
4.1.1	Interaction graphs of historical named entities (M12).	Other	Access: Link
6.1.1	Metaphor, irony, and sarcasm benchmark in Slovene (M12).	Dataset	SloPragEval SloPragMega
7.1.1	Annual reports (M12)	Report	Access: Link (in Slovene)
March 2026
1.2.1	KGs and raw texts datasets (M18).	Dataset
2.2.2	Grammar checking LLMs (M18).	Model
2.3.2	Dataset for evaluating grammatical knowledge of LLMs (M18).	Dataset
4.4.1	A new RAG system for Slovenian capable of detecting contradictions in documents (M18)	Application
5.1.1	Novel methodological approaches to historical and ideological analysis using LLMs (M18).	Report
5.2.1	Novel methodology for digital folkloristics (M18).	Report
5.3.1	Database of Slovene legal texts (M18).	Database
August 2026
	Updated Digital dictionary database.	Database
September 2026
1.1.3	Final improved LLM (M24).	Model
1.2.2	Initial improved LLMs (M24).	Model
1.3.2	Slovene VLM model (M24).	Model
2.1.1	DDDS with generated lexicographic data – first version (M24)	Dataset
2.2.3	Authentic grammar checking evaluation datasets (M24).	Dataset
4.1.2	Visualization of extracted named entity graphs (M24).	Report
4.2.1	Novel methodology for diachronic analysis using LLMs (M24).	Report
4.3.1	Dataset of images from Slovene historical periodicals (M24).	Dataset
6.1.2	Pragmatic and associative behavior explanation benchmark (M24).	Dataset
6.3.1	Bias detection datasets for Slovene (M24).	Dataset
7.1.2	Annual reports M24	Report
October 2026
3.1.3	Manually multi-reference-transcribed data for the spoken benchmark corpus (1 hour new data + 3 hours of existing ASR data) (M25).	Dataset
March 2027
1.2.3	Final improved LLMs (M30)	Model
3.2.2	Models for dialogue act and sentiment identification in Slovenian speech (M30)	Model
4.3.2	VLM adapted for selected DH tasks (M30).	Model
6.2.1	Speech dataset (4 hours), annotated with dialogue act and sentiment annotations (M30).	Dataset
6.3.2	Debiasing approach for LLMs (M30).	Report
September 2027
2.1.2	DDDS with generated lexicographic data – final version (M36)	Other
2.3.3	Multilingual and cross-lingual grammatical analyses (M36)	Report
3.1.4	Audio speech database of conversation data (M36).	Database
3.3.1	Slovenian audio speech database from publicly available resources (min. 300 hours) (M36)	Database
3.4.1	A new ASR-LLM integration method for domain-specific ASR for low-resource languages (M36)	Program
5.1.2	Novel analyses of ideological concepts through history (M36).	Report
5.2.2	Novel analyses of conflict resolution rituals (M36).	Report
5.3.2	An RAG-based system for Slovene legal support (M36	Other
6.2.2	Multi-reference ASR task, dialogue processing task, and sentiment in speech tasks (M36).	Report
6.3.3	Spoken language bias detection analysis (M36).	Report
6.4.1	A novel knowledge-based explanation methodology for LLM explanation (M36).	Report
7.1.3	Annual reports M36	Report
7.3.2	At least 40 conference/submitted journal publications (M36).	Report
7.3.3	The above-mentioned activities (M36).	Other

LIST OF DELIVERABLES BY WORK PACKAGES AND MONTHS

CONTACT

DURATION

LOCATION

FINANCED BY

Archive