ACCESS TO HISTORIC VERSIONS OF THE SLOVENE GIGAFIDA CORPUS

In the online concordancers noSketch Engine in KonText on CLARIN.SI, only the latest version of the Gigafida 2.0 corpus was available. Although this edition contains texts from older versions, it has a few differences: duplicated texts and texts in non-standard Slovene were removed. Furthermore, the corpus contains updated linguistic tags.

From time to time, the need to access older versions of the corpus arises, e.g. to analyse texts containing non-standard language features. This is especially important for researching Slovene spoken in neighbouring countries – such language is present in the bulletin Novi Matajur, which was removed from Gigafida 2.0. Furthermore, access to older versions enables previous research repeatability and reproducibility.

In this project, access the previous versions of the Gigafida corpus was granted through the online concordancers noSketch Engine and KonText. More specifically, the corpora FidaPLUS, Gigafida 1.0 and Gigafida 1.1 are accessible. It was planned that the very first version of Gigafida (the corpus FIDA) would also be available – agreements with the owners of the corpus, the companies Amebis, d.o.o. and DZS, d.d. have been signed. Unfortunately, all project funds have been exhausted for copyright transfer from DZS, d.d. to the University of Ljubljana. Thus, no funding was left to actually transfer the corpus from physical CDs to a digital format which would be suitable for the concordancers.

The following versions of the Gigafida corpus are available through noSketch Engine and KonText:

LINKS AND CONTACT

Andraž Repar

Centre for Language Resources and Technologies, University of Ljubljana
Večna pot 113,

SI-1000 Ljubljana, Slovenia