TOOLS AND RESOURCES
Language technologies include software, online services and databases.
ObeliksStatistical tagger for SloveneA tagger is a computer program which segments any text into units and lets us assign specific information to individual words, i.e. parts of speech, gramamatical properties (gender, case, number, etc.) or enables us to assign its basic form in the case it has several inflected forms. The tagger can be tested here. |
|
Statistical syntactic parser for SloveneThe MSTParser is a computer program for determining the grammatical structure of a sentence automatically. This allows us to identify predicates, subject, objects etc. Syntactic parsing also represents one of the basic natural language processing procedures which supports more complex language technologies such as machine translation, information extraction, speech technologies, automatic summarization, question-answering etc. |
|
ssj500kA manually annotated training corpusThe ssj500k is a training corpus containing manually annotated grammatical information. This data is used for training computer programs for automatic text analysis which prepare a statistical model or are used to evaluate rule-based analysis programs. |
|
ccGigafida in ccKresOpen-access corporaccGigafida and ccKres are two sampled subcorpora of the Gigafida corpus and its balanced version, the Kres corpus. The ccGigafida corpus contains approximately 9% or 100 million words, taken from the Gigafida corpus. The ccKres contains approximately 9% or 10 million words, taken from the Kres corpus. The structure of the sample corpora is the same as the structure of their parent corpora. The ccGigafida and ccKRES corpora enable in-depth linguistic and computer (language technology) analyses of the Slovene language without any restrictions. |
|