{"id":4817,"date":"2024-03-20T13:14:30","date_gmt":"2024-03-20T12:14:30","guid":{"rendered":"https:\/\/www.cjvt.si\/?page_id=4817"},"modified":"2025-05-26T14:49:08","modified_gmt":"2025-05-26T12:49:08","slug":"solar-corpus","status":"publish","type":"page","link":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/","title":{"rendered":"\u0160olar Corpus"},"content":{"rendered":"<div class='flex_column_table av-equal-height-column-flextable -flextable' style='margin-top:0px; margin-bottom:0px; '><div class=\"flex_column av_two_third  flex_column_table_cell av-equal-height-column av-align-middle av-zero-column-padding first  avia-builder-el-0  el_before_av_one_third  avia-builder-el-first  \" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h2>\u0160olar corpus<\/h2>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h5>The \u0160olar developmental corpus contains texts that Slovenian primary and secondary school pupils have independently produced in class. The texts also contain teacher&#8217;s corrections.<\/h5>\n<\/div><\/section><\/p><\/div>\n<div class='av-flex-placeholder'><\/div><div class=\"flex_column av_one_third  flex_column_table_cell av-equal-height-column av-align-middle av-zero-column-padding   avia-builder-el-3  el_after_av_two_third  el_before_av_two_third  \" style='border-radius:0px; '><p><div  class='avia-button-wrap avia-button-left  avia-builder-el-4  el_before_av_button  avia-builder-el-first  gumb-sodelavci-levo' title=\"Veliki slovensko-mad\u017earski slovar\"><a href='https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/slovene-hungarian-dictionary\/'  class='avia-button  av-button-notext   avia-icon_select-yes-left-icon avia-color-theme-color avia-size-small avia-position-left '   ><span class='avia_button_icon avia_button_icon_left ' aria-hidden='true' data-av_icon='\ue87c' data-av_iconfont='entypo-fontello'><\/span><span class='avia_iconbox_title' ><\/span><\/a><\/div><br \/>\n<div  class='avia-button-wrap avia-button-left  avia-builder-el-5  el_after_av_button  avia-builder-el-last  gumb-sodelavci-desno' title=\"Slovar sopomenk sodobne sloven\u0161\u010dine\"><a href='https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/the-dictionary-of-modern-slovene\/'  class='avia-button  av-button-notext   avia-icon_select-yes-right-icon avia-color-theme-color avia-size-small avia-position-left '   ><span class='avia_iconbox_title' ><\/span><span class='avia_button_icon avia_button_icon_right' aria-hidden='true' data-av_icon='\ue87d' data-av_iconfont='entypo-fontello'><\/span><\/a><\/div><\/p><\/div><\/div><!--close column table wrapper. Autoclose: 1 --><div class='flex_column_table av-equal-height-column-flextable -flextable' style='margin-top:0px; margin-bottom:0px; '><div class=\"flex_column av_two_third  flex_column_table_cell av-equal-height-column av-align-middle av-zero-column-padding first  avia-builder-el-6  el_after_av_one_third  el_before_av_one_third  column-top-margin\" style='border-radius:0px; '><p><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>The \u0160olar developmental corpus contains texts that have been produced independently by pupils in various Slovenian primary and secondary schools. A large part of the texts also contains teacher&#8217;s corrections (linguistic and contextual).<\/p>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">The \u0160olar corpus is modelled on the language acquisition corpora, but differs in that (a) the texts are not project-initiated, but represent actual school production by students, and (b) the linguistic corrections highlighted in the corpus are real and made by teachers, not researchers. These features make \u0160olar a valuable and unique resource not only in Slovenia but also internationally.<\/span><\/p>\n<\/div><\/section><\/p><\/div><\/p>\n<div class='av-flex-placeholder'><\/div><div class=\"flex_column av_one_third  flex_column_table_cell av-equal-height-column av-align-middle   avia-builder-el-9  el_after_av_two_third  el_before_av_one_full  column-top-margin\" style='background: #f0f0f0; padding:30px; background-color:#f0f0f0; border-radius:0px; '><p><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><strong>LINKS AND CONTACTS<\/strong><\/p>\n<\/div><\/section><br \/>\n<div  style='height:20px' class='hr hr-invisible   avia-builder-el-11  el_after_av_textblock  el_before_av_textblock '><span class='hr-inner ' ><span class='hr-inner-style'><\/span><\/span><\/div><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>dr. \u0160pela Arhar Holdt<\/p>\n<p>Faculty of Computer and Information Science UL<\/p>\n<p><span style=\"font-weight: 400;\">1000 Ljubljana<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400;\">E-mail: &#115;&#x70;&#101;&#x6c;&#97;&#x2e;&#97;&#x72;&#104;&#x61;&#114;&#x68;&#111;&#x6c;&#100;&#x74;&#64;&#x66;&#102;&#x2d;&#117;&#x6e;&#105;&#x2d;&#108;&#x6a;&#46;&#x73;i<\/span><\/li>\n<\/ul>\n<\/div><\/section><\/p><\/div><\/div><!--close column table wrapper. Autoclose: 1 -->\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-13  el_after_av_one_third  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>The current version, \u0160olar 3.0, contains 5,485 texts written by Slovenian secondary school students (15-19 years old) and primary school students in grades 7-9, with a small percentage from grade 6. For each text, information is given on the school (primary or secondary), the subject, the level (grade or year), the type of text, the region and the year of production. The majority of the corpus is made up of essays, but there are also other texts produced in the classroom, such as summaries or descriptions of texts, examples of formal applications, etc. More information about the \u0160olar 3.0 corpus can be found in <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s10579-024-09758-4\" target=\"_blank\" rel=\"noopener\">this scientific paper<\/a>.<\/p>\n<p>Part of the corpus (2,094 texts) contains teacher corrections, which are also classified according to the content classification system described in the<a href=\"https:\/\/wiki.cjvt.si\/books\/11-jezikovni-popravki-solar\/page\/oznacevalne-smernice\" target=\"_blank\" rel=\"noopener\"> annotation guidelines<\/a> (the guidelines are in Slovene). The annotation of the corrections (there are more than 35,000 corrections in the corpus) is more detailed than in other similar projects, which is useful for the preparation of teaching materials, tools for machine correction of Slovene texts, etc. The corrections, as authentic examples of giving feedback for the development of writing skills, are valuable for the training of future teachers, linguistic-didactic research, etc.<\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-15  el_after_av_one_full  el_before_av_two_third  column-top-margin\" style='border-radius:0px; '><div  class='avia-image-container  av-styling-    avia-builder-el-16  avia-builder-el-no-sibling  avia-align-center '  itemprop=\"image\" itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/ImageObject\"  ><div class='avia-image-container-inner'><div class='avia-image-overlay-wrap'><img class='avia_image' src='https:\/\/www.cjvt.si\/wp-content\/uploads\/2025\/03\/Solar.png' alt='' title='\u0160olar' height=\"623\" width=\"1254\"  itemprop=\"thumbnailUrl\"  \/><\/div><\/div><\/div><\/div>\n<div class=\"flex_column av_two_third  flex_column_div av-zero-column-padding first  avia-builder-el-17  el_after_av_one_full  el_before_av_one_third  column-top-margin\" style='margin-top:36px; margin-bottom:0px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>The different versions of the corpus are available in the CLARIN.SI repository under an open licence, which means that they can be used for different research and development purposes. Links can be found in the <a href=\"https:\/\/www.cjvt.si\/en\/tools-and-resources\/databases\/\" target=\"_blank\" rel=\"noopener\">Databases<\/a> section of our website. In addition, pre-prepared exports of corpus data, such as a <a href=\"https:\/\/www.clarin.si\/repository\/xmlui\/handle\/11356\/1716\" target=\"_blank\" rel=\"noopener\">frequency list of linguistic corrections<\/a>, a <a href=\"https:\/\/www.clarin.si\/repository\/xmlui\/handle\/11356\/2011\" target=\"_blank\" rel=\"noopener\">list of collocations<\/a> and a <a href=\"https:\/\/www.clarin.si\/repository\/xmlui\/handle\/11356\/2009\" target=\"_blank\" rel=\"noopener\">list of syntactic structures<\/a>, are also available on CLARIN.SI.<\/p>\n<p>As the preparation of corpora with tagged corrections is extremely time-consuming, it is crucial to make sure that the data is easily accessible for different types of use. Center of language resources and technologies has therefore developed a new format, as well as a completely new corpus concordancer for such corpora. Unlike previous similar tools, this one allows powerful searching and transparent use of the results, especially when it comes to rich corpus metatags and language corrections.<\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_third  flex_column_div   avia-builder-el-19  el_after_av_two_third  avia-builder-el-last  column-top-margin\" style='margin-top:36px; margin-bottom:0px; background: #f0f0f0; padding:30px; background-color:#f0f0f0; border-radius:0px; '><p><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>\u0160olar 3.0 is available in the CJVT condordancer below.<\/p>\n<\/div><\/section><br \/>\n<div  class='avia-button-wrap avia-button-center  avia-builder-el-21  el_after_av_textblock  avia-builder-el-last ' ><a href='https:\/\/viri.cjvt.si\/solar\/en\/'  class='avia-button   avia-icon_select-yes-left-icon avia-color-theme-color avia-size-small avia-position-center '   ><span class='avia_button_icon avia_button_icon_left ' aria-hidden='true' data-av_icon='\ue87d' data-av_iconfont='entypo-fontello'><\/span><span class='avia_iconbox_title' >\u0160OLAR 3.0<\/span><\/a><\/div><\/p><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":19,"featured_media":0,"parent":2183,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","inline_featured_image":false,"episode_type":"","audio_file":"","podmotor_file_id":"","podmotor_episode_id":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","filesize_raw":"","date_recorded":"","explicit":"","block":"","itunes_episode_number":"","itunes_title":"","itunes_season_number":"","itunes_episode_type":"","footnotes":""},"class_list":["post-4817","page","type-page","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>\u0160olar Corpus - CJVT<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u0160olar Corpus - CJVT\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/\" \/>\n<meta property=\"og:site_name\" content=\"CJVT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/centerzajezikovnevireintehnologije\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-26T12:49:08+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/cjvt-projects\\\/corpus-solar\\\/\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/cjvt-projects\\\/corpus-solar\\\/\",\"name\":\"\u0160olar Corpus - CJVT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#website\"},\"datePublished\":\"2024-03-20T12:14:30+00:00\",\"dateModified\":\"2025-05-26T12:49:08+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/cjvt-projects\\\/corpus-solar\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/cjvt-projects\\\/corpus-solar\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/cjvt-projects\\\/corpus-solar\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"CJVT projects\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/research\\\/cjvt-projects\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Gigafida Corpus\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/\",\"name\":\"CJVT\",\"description\":\"Center za jezikovne vire in tehnologije\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#organization\",\"name\":\"CJVT\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/CJVT-logo-red.jpg\",\"contentUrl\":\"https:\\\/\\\/www.cjvt.si\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/CJVT-logo-red.jpg\",\"width\":1300,\"height\":683,\"caption\":\"CJVT\"},\"image\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/centerzajezikovnevireintehnologije\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u0160olar Corpus - CJVT","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/","og_locale":"en_US","og_type":"article","og_title":"\u0160olar Corpus - CJVT","og_url":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/","og_site_name":"CJVT","article_publisher":"https:\/\/www.facebook.com\/centerzajezikovnevireintehnologije","article_modified_time":"2025-05-26T12:49:08+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/","url":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/","name":"\u0160olar Corpus - CJVT","isPartOf":{"@id":"https:\/\/www.cjvt.si\/en\/#website"},"datePublished":"2024-03-20T12:14:30+00:00","dateModified":"2025-05-26T12:49:08+00:00","breadcrumb":{"@id":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/corpus-solar\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.cjvt.si\/en\/"},{"@type":"ListItem","position":2,"name":"Research","item":"https:\/\/www.cjvt.si\/en\/research\/"},{"@type":"ListItem","position":3,"name":"CJVT projects","item":"https:\/\/www.cjvt.si\/en\/research\/cjvt-projects\/"},{"@type":"ListItem","position":4,"name":"Gigafida Corpus"}]},{"@type":"WebSite","@id":"https:\/\/www.cjvt.si\/en\/#website","url":"https:\/\/www.cjvt.si\/en\/","name":"CJVT","description":"Center za jezikovne vire in tehnologije","publisher":{"@id":"https:\/\/www.cjvt.si\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.cjvt.si\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.cjvt.si\/en\/#organization","name":"CJVT","url":"https:\/\/www.cjvt.si\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.cjvt.si\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.cjvt.si\/wp-content\/uploads\/2020\/06\/CJVT-logo-red.jpg","contentUrl":"https:\/\/www.cjvt.si\/wp-content\/uploads\/2020\/06\/CJVT-logo-red.jpg","width":1300,"height":683,"caption":"CJVT"},"image":{"@id":"https:\/\/www.cjvt.si\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/centerzajezikovnevireintehnologije"]}]}},"_links":{"self":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/4817","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/comments?post=4817"}],"version-history":[{"count":13,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/4817\/revisions"}],"predecessor-version":[{"id":7190,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/4817\/revisions\/7190"}],"up":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/2183"}],"wp:attachment":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/media?parent=4817"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}