CJVT’s infrastructure offers researchers various services: language resources and tools for Slovene, online platforms (crowdsourcing, gamification), and website hosting for research projects and programmes in linguistics.
DSDE – Development of Slovene in a Digital Environment
The project aim is to meet the needs for computational tools and services in the field of language technologies for Slovene – for research institutions, companies, and a wider public.
KOLOS – Collocations in Slovene
Basic research into semantic and temporal aspects of collocations, as well as statistics for measuring them.
New grammar of contemporary standard Slovene: sources and methods.
Hungarian-Slovene dictionary concept: from language resource to users.
European survey on dictionary use
Identifying user expectations about monolingual dictionaries, their experience in using them, and offered suggestions for improvements.
Corpora Gigafida, Kres,
ccGigafida and ccKres upgrade
New corpora versions: balanced texts, improved accuracy, new user interface …
Thesaurus of Modern Slovene promotion
Informing the public about the new thesaurus, synonymy, and the possibilities of public involvement.
Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society.
STARK – Statistical analysis of dependency-parsed corpora
STARK is a tool for statistical analysis of dependency-parsed corpora. It returns a frequency lists for dependency trees from dependency-parsed corpora.
Historic versions of the Gigafida Slovene reference corpus
We enabled access to the previous versions of the Gigafida corpus through the online concordancers noSketch Engine and KonText. More specifically, the corpora FidaPLUS, Gigafida 1.0 and Gigafida 1.1 are accessible.
Sloleks lexicon accentuation
The Slovene morphological Lexicon Sloleks was updated with automatic (and partially manually-annotated) accents. Additionally, a new interface which enables crowdsourcing of accent data was developed.
Keywords and n-grams from a textbook corpus
In this project, a corpus of textbooks for primary and secondary school was built. A list of words, n-grams and keywords was extracted.
LIST – efficient Slovene corpus analysis tool
A clear and understandable user interface for the corpusStatistics tool (renamed to LIST) was developed. It enables its users to easily access language statistics in Slovene and other corpora.
The TOLMAČ tool
The project Tolmač (Eng. Interpreter) is focused on developing of a system for automatically translating lectures from Slovene to other languages. Automatic subtitles will help people will hearing loss, and lecture excerpts and recordings will be accessible at a dedicated website.
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.