The Gigafida Corpus

The Gigafida corpus is a reference corpus of written slovenščine: an extensive collection of text with a well thought-out design that is a key component for linguistic research, describing the language (dictionaries, grammars), preparing learning materials, developing a variety of language resources and automated procedures for processing Slovene. Gigafida contains newspapers, magazines, selected online texts, literary texts, factual texts, textbooks and other material.

CJVT enables access to and maintains the quality of the corpus, which must always be up-to-date, reliably linguistically annotated, clearly documented and available to the community for different purposes. Gigafida is also the main language resource for all CJVT products, such as responsive dictionaries, databases, and portals. Several resources have been compiled with direct links to the corpus in mind – usage examples in a broad context being only a single click away.

The latest version of the corpus is Gigafida 2.0 which was prepared as part of the the project Upgrade of Gigafida, Kres, ccGigafida and ccKres. Previous versions of the corpus were called FIDA, FidaPLUS and Gigafida 1.0. The current version can be explored through different programs. We developed our own user-friendly CJVT concordancer. For specialised linguistic research, the corpus is part of the concordance tools NoSke, Kontext and SketchEngine. More information on the development and contents of the corpus can be found here.


The Centre for Language Resources and Technologies at the University of Ljubljana
Večna pot 113, SI-1000 Ljubljana