Upgrade of Gigafida, Kres, ccGigafida and ccKres

The Gigafida, Kres, ccGigafida and ccKres corpora form the basis for the development of modern language handbooks and language technologies for Slovene. Gigafida and Kres have user friendly interfaces and are used often by linguists, translators, editors, proofreaders, teachers and other similar user groups. These corpora are essential for language research and development, however they can only serve their purpose if they are continually updated and upgraded.

The project Upgrade of Gigafida, Kres, ccGigafida and ccKres is financed by the Ministry of Culture under the contract nr. 33400-15-141007 between the Ministry and the University of Ljubljana for the period 2015–2018. It is run by the Centre for Language Resources and Technologies and has three objectives: targeted acquisition of new materials, machine processing of new and existing materials, public availability and dissemination of upgraded corpora.

We will focus on types of texts which are currently underrepresented in Gigafida and Kres, i.e. mainly school reading materials and other popular literature. On the other hand we will add texts from selected news websites, which will ensure that the corpus data is more up-to-date. The new materials will enlarge the existing corpora by about a quarter, which in the case of Gigafida means it will grow from 1.2 to around 1.5 billion words. The technical aspects will be updated as well: we will develop tools for removing surplus copies of texts, improve the accuracy of linguistic annotation and divide standard language texts from texts which deviate from linguistic standards into subcorpora.



By continuing to browse the site, you are agreeing to our use of cookies. More info

Our website uses “cookies” to distinguish between visitors and to perform website statistics usage. This allows us to improve the page constantly. Users who do not allow our website "cookies" to be recorded on their computer, will not be able to use all the functionalities of the website (video, comment on Facebook, etc.).Cookies are small files that a website that you visited records on your computer. The next time you are visiting the same site, the system can recognize you.

Our website uses the following types of cookies:

First-Party Cookies

wordpress_test_cookie: A session cookie, deleted when you close your web browser.

_icl_current_language: WPML cookie, stores selected language version of the page. Expires in 24 hours.

euCookie: stores your decision to accept cookies, expires when you close your web browser.

Third-Party Cookies

__cfduid: 1 x, 1 x - used by the content network, to identify trusted web traffic.; does not store any personal data. Expiration: 5 years.

Hide details