Upgrade of the Gigafida, Kres, ccGigafida and ccKres Corpora

In June 2019, the latest version of the Gigafida corpus has been published – Gigafida 2.0, the Corpus of Written Standard Slovene. It is the result of the project Upgrade of Gigafida, Kres, ccGigafida and ccKres, which was financed by the Slovenian Ministry of Culture from 2015 to 2018 and was carried out by the Centre for Language Resources and Technologies (contract no. 33400-15-141007).

The main novelties of the new corpora include texts, that were not represented well in the initial versions. These are mainly school materials (textbooks, required reading) and literary texts that are often read in class. Furthermore, texts from various online sources were added (news portals, daily newspapers etc.) to make the corpora more up-to-date. Technical improvements include developing processes for duplicated text removal, improving accuracy of linguistic annotation, and separating texts in the standard language from those that deviate from linguistic standards.

Additionally, we developed a new interface for the updated version of Gigafida. It was designed based on previous user surveys and according to the official graphic design of the Centre for language Resources and Technologies and its resources.

Gigafida 2.0, the Corpus of Written Standard Slovene is a basis for linguistic research and the development of contemporary language resources and technologies for Slovene. In the future, new versions of the Thesaurus of Modern Slovene and the Collocations Dictionary of Modern Slovene will be published based on it. Furthermore, a digital lexicographic database for the Dictionary of Modern Slovene will be developed from it.

LINKS AND CONTACT

The Centre for Language Resources and Technologies at the University of Ljubljana
Večna pot 113, SI-1000 Ljubljana