Infrastructure Support

CJVT’s infrastructure offers researchers various services: language resources and tools for Slovene, online platforms (crowdsourcing, gamification), and website hosting for research projects and programmes in linguistics.

AI4DH project

The Centre of Excellence in Artificial Intelligence for Digital Humanities (CoE AI4DH) is a community of experts in AI, social sciences, and digital humanities that pursue excellent science in an interdisciplinary and collaborative way. By leveraging AI expertise and a top-tier computational infrastructure, CoE fosters interdisciplinary collaborations. If you would like to integrate AI into your social science or humanities research, we are here to support you.

LLM4DH project

LLM4DH is a project that combines the latest artificial intelligence technologies to explore the digital humanities. Modern approaches are enabling new insights in linguistics, history, law and folkloristics.The project contributes to the development of interdisciplinary research and contributes new insights for understanding society.

Data completion and gamification of dictionary resources (PODVIG)

The PODVIG project is working on upgrading the data in the Thesaurus of Modern Slovene, the Collocations Dictionary and the Dictionary for Speakers of Slovene as a Second and Foreign Language. In addition, the project will develop an online language game with data from these dictionaries.

PoVeJMo – Adaptive Natural Language Processing with Large Language Models

The key objective of the Adaptive Natural Language Processing with Large Language Models (PoVeJMo) programme is the development of large-scale language models which have an impact on almost the entire field of artificial intelligence and machine learning.

MEZZANINE – The Basic Research for the Development of Spoken Language Resources and Speech Technologies for the Slovenian Language

The Basic Research for the Development of Spoken Language Resources and Speech Technologies for the Slovenian Language (teMeljnE raZiskave Za rAzvoj govorNih vIrov in tehNologij za slovEnščino – MEZZANINE) is a large basic research project financed by the Slovenian Research and Innovation Agency.

University licence for Sketch Engine

Center for language resources and technologies allows access to the corpus tool Sketch Engine for all representatives of University of Ljubljana.

PROP – Digitally supported development of writing skills

The aim of the project Empirical foundations for digitally-supported development of writing skills is to support teachers who correct and grade student writing. The solution lies in the digital support of teachers’ work and a digitally supported model of providing feedback to teachers and students.

SPOT – Treebank-Driven Approach to the Study of Spoken Slovenian

The SPOT project focuses on describing the syntactic features of Slovene speech using a new method based on syntactically parsed collections of written and spoken texts (so-called treebanks).

ONLINE NOTES

The project Tolmač (Eng. Interpreter) is focused on developing of a system for automatically translating lectures from Slovene to other languages. Automatic subtitles will help people will hearing loss, and lecture excerpts and recordings will be accessible at a dedicated website.

SLOKIT – Corpus informer and text analyzer

The main purpose of the project is to upgrade the research infrastructure portal CLARIN.SI with services that will bring the content of the portal, especially the corpora, closer to a wider range of users.

Čas za slovenščino

Based on the Čas za slovenščino 1 textbook set, the project has developed a comprehensive interactive learning material for the initial learning of Slovene, which provides learners with a high-quality and enjoyable Slovenian learning experience.

KOST corpus

The Corpus of Slovene as a Foreign Language KOST is a digital collection of texts written by adult speakers for whom Slovene is not their first language.

Collocation dataset

The project produced a training set of 713,310 collocation candidates from the Gigafida 2.0 reference corpus, which were labelled according to their collocational relevance.

DSDE – Development of Slovene in a Digital Environment

The project goal is to meet the needs for computational tools and services in the field of language technologies for Slovene.

SoKol – Synonyms and Collocations 2.0

Upgrading fundamental dictionary resources and databases of CJVT UL, where the Slovene Thesaurus and Slovene Collocations Dictionary will be upgraded into version 2.0

KOLOS – Collocations in Slovene

Basic research into semantic and temporal aspects of collocations, as well as statistics for measuring them.

The Jezikovna Slovenija Site

A website about the language policy of Slovenia, where one can find counsel for questions related to language policy.

KaUč – High-quality Slovene Textbooks

The project is developing quality indicators for textbooks. These will be of practical use in the textbook validation and evaluation process. Tools are also being developed to assist in the definition of the indicators.

NEW GRAMMAR

New grammar of contemporary standard Slovene: sources and methods.

KOMASS

Hungarian-Slovene dictionary concept: from language resource to users.

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media

The EMBEDDIA project aims to bring high-end language technologies to under-represented languages in the EU by using cross-lingual embeddings coupled with deep neural networks.

European survey on dictionary use

Identifying user expectations about monolingual dictionaries, their experience in using them, and offered suggestions for improvements.

Corpora Gigafida, Kres,
ccGigafida and ccKres upgrade

New corpora versions: balanced texts, improved accuracy, new user interface …

Thesaurus of Modern Slovene promotion

Informing the public about the new thesaurus, synonymy, and the possibilities of public involvement.

FRENK

Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society.

STARK – Statistical analysis of dependency-parsed corpora

STARK is a tool for statistical analysis of dependency-parsed corpora. It returns a frequency lists for dependency trees from dependency-parsed corpora.

Historic versions of the Gigafida Slovene reference corpus

We enabled access to the previous versions of the Gigafida corpus through the online concordancers noSketch Engine and KonText. More specifically, the corpora FidaPLUS, Gigafida 1.0 and Gigafida 1.1 are accessible.

Sloleks lexicon accentuation

The Slovene morphological Lexicon Sloleks was updated with automatic (and partially manually-annotated) accents. Additionally, a new interface which enables crowdsourcing of accent data was developed.

Keywords and n-grams from a textbook corpus

In this project, a corpus of textbooks for primary and secondary school was built. A list of words, n-grams and keywords was extracted.

LIST – efficient Slovene corpus analysis tool

A clear and understandable user interface for the corpusStatistics tool (renamed to LIST) was developed. It enables its users to easily access language statistics in Slovene and other corpora.

TermFrame

TermFrame is a three-year research project (2018-2021) that addresses terminology and knowledge at the intersection of languages, cognition and computer science. The project is funded by the Slovenian Public Agency for Research.

Trojina Institute

Center of language resources and technologies took over the infrastructure maintenance of the Trojina Institute.