{"id":956,"date":"2020-03-31T22:48:06","date_gmt":"2020-03-31T20:48:06","guid":{"rendered":"https:\/\/www.cjvt.starkmat.si\/template-projekt\/work-packages\/work-package-1\/"},"modified":"2024-01-25T11:23:52","modified_gmt":"2024-01-25T10:23:52","slug":"work-package-1","status":"publish","type":"page","link":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/","title":{"rendered":"Work Package 1: SloSBZ"},"content":{"rendered":"<div class=\"flex_column av_one_full  no_margin flex_column_div av-zero-column-padding first  avia-builder-el-0  el_before_av_one_full  avia-builder-el-first  \" style='margin-top:0px; margin-bottom:30px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h2>General Knowledge Base for Slovenian<\/h2>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>SloSBZ<\/h3>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  no_margin flex_column_div av-zero-column-padding first  avia-builder-el-3  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='margin-top:0px; margin-bottom:30px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>The project&#8217;s aim is the development of Slovenian language resources for constructing large generative language models and supportive tools for model preparation and usage during training when necessary. For the successful preparation of models in the SloLLamAI project (RRP2), its adaptation, and demonstrations within the framework of the other four industrial projects (RRP3-6), we require significant amounts of high-quality dialogues and command requests with ranked responses, as well as fundamental large language corpora covering conversational language and specialized terminological areas.<\/p>\n<p>The project is one of two foundational projects within the program and provides the basic language infrastructure needed for model training. This means that the work in the project will primarily involve preparing tools and databases, which will then be used in the preparation of the base SloLLaMai model and later in the experimental development of demonstrations of individual technologies within the industrial projects. The results of the other projects will be fed back to the SloLLaMai project through its requirements, guiding the further development of corpora. Three versions of a digital dictionary base will also be released within the project.<\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  no_margin flex_column_div av-zero-column-padding first  avia-builder-el-5  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='margin-top:0px; margin-bottom:30px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>Specific objectives:<\/h3>\n<ol>\n<li>Provide corpora suitable for training large models that can be used within the program and beyond.<\/li>\n<li>Provide tools that will support the preparation of the mentioned corpora.<\/li>\n<li>Ensure language support for Slovenian by adapting existing corpora and creating new ones based on the needs of industrial research and experimental development during the project.<\/li>\n<li>Provide upgrades to the pipelines of extended libraries for training and using large language models.<\/li>\n<li>Provide validation and test sets for validating the generated models.<\/li>\n<\/ol>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  no_margin flex_column_div av-zero-column-padding first  avia-builder-el-7  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='margin-top:0px; margin-bottom:30px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>Expected results:<\/h3>\n<ul>\n<li><strong>D1.1:<\/strong> Open-access Slovenian training set for dialogues and command requests (August 2024).<\/li>\n<li><strong>D1.2:<\/strong> Large language corpus for conversational language and addressed terminological areas &#8211; first version (August 2024).<\/li>\n<li><strong>D1.3:<\/strong> Validation corpus for large language models (September 2024).<\/li>\n<li><strong>D14:<\/strong> Tools for preparing dictionary bases for model training and components for integrating open dictionary forms (February 2025).<\/li>\n<li><strong>D1.5:<\/strong> Dedicated tokenizers for the Slovenian language &#8211; first version (February 2025).<\/li>\n<li><strong>D1.6:<\/strong> Large language corpus (for conversational language) and addressed terminological areas &#8211; second version (August 2025).<\/li>\n<li><strong>D1.7:<\/strong> Dedicated tokenizers for the Slovenian language &#8211; final version (March 2026).<\/li>\n<li><strong>D1.8:<\/strong> Large language corpus (for conversational language) and addressed terminological areas &#8211; final version (June 2026).<\/li>\n<li><strong>D1.9:<\/strong> Knowledge base created based on the Digital Dictionary Base (July 2026).<\/li>\n<\/ul>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  no_margin flex_column_div av-zero-column-padding first  avia-builder-el-9  el_after_av_one_full  el_before_av_one_fourth  column-top-margin\" style='margin-top:0px; margin-bottom:30px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>Project partners:<\/h3>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding first  avia-builder-el-11  el_after_av_one_full  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h5>Project leader:<\/h5>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-13  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><div  class='avia-button-wrap avia-button-center  avia-builder-el-14  avia-builder-el-no-sibling ' ><a href='https:\/\/fri.uni-lj.si\/en' class='avia-button avia-button-fullwidth   avia-icon_select-no avia-color-theme-color '  style='color:#ffffff; ' ><span class='avia_iconbox_title' >Faculty of Computer and Information Science UL<\/span><span class='avia_button_background avia-button avia-button-fullwidth avia-color-theme-color-highlight' ><\/span><\/a><\/div><\/div><div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-15  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><\/div><\/p>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-16  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding first  avia-builder-el-17  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h5>Partners:<\/h5>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-19  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><div  class='avia-button-wrap avia-button-center  avia-builder-el-20  avia-builder-el-no-sibling ' ><a href='https:\/\/isjfr.zrc-sazu.si\/en\/homepage' class='avia-button avia-button-fullwidth   avia-icon_select-no avia-color-theme-color '  style='color:#ffffff; ' ><span class='avia_iconbox_title' >Fran Ramov\u0161 Institute of the Slovenian Language<\/span><span class='avia_button_background avia-button avia-button-fullwidth avia-color-theme-color-highlight' ><\/span><\/a><\/div><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-21  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><div  class='avia-button-wrap avia-button-center  avia-builder-el-22  avia-builder-el-no-sibling ' ><a href='https:\/\/www.inz.si\/en\/' class='avia-button avia-button-fullwidth   avia-icon_select-no avia-color-theme-color '  style='color:#ffffff; ' ><span class='avia_iconbox_title' >Institut of Contemporary History<\/span><span class='avia_button_background avia-button avia-button-fullwidth avia-color-theme-color-highlight' ><\/span><\/a><\/div><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-23  el_after_av_one_fourth  avia-builder-el-last  column-top-margin\" style='border-radius:0px; '><div  class='avia-button-wrap avia-button-center  avia-builder-el-24  avia-builder-el-no-sibling ' ><a href='https:\/\/semantika.eu\/en-us\/' class='avia-button avia-button-fullwidth   avia-icon_select-no avia-color-theme-color '  style='color:#ffffff; ' ><span class='avia_iconbox_title' >Semantika d.o.o.<\/span><span class='avia_button_background avia-button avia-button-fullwidth avia-color-theme-color-highlight' ><\/span><\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":953,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","inline_featured_image":false,"episode_type":"","audio_file":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","filesize_raw":"","date_recorded":"","explicit":"","block":"","itunes_episode_number":"","itunes_title":"","itunes_season_number":"","itunes_episode_type":"","footnotes":""},"class_list":["post-956","page","type-page","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Work Package 1: SloSBZ - PoVeJMo<\/title>\n<meta name=\"robots\" content=\"noindex, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Work Package 1: SloSBZ - PoVeJMo\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/\" \/>\n<meta property=\"og:site_name\" content=\"PoVeJMo\" \/>\n<meta property=\"article:modified_time\" content=\"2024-01-25T10:23:52+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/\",\"url\":\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/\",\"name\":\"Work Package 1: SloSBZ - PoVeJMo\",\"isPartOf\":{\"@id\":\"https:\/\/www.cjvt.si\/povejmo\/en\/#website\"},\"datePublished\":\"2020-03-31T20:48:06+00:00\",\"dateModified\":\"2024-01-25T10:23:52+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.cjvt.si\/povejmo\/o-programu\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Work Packages\",\"item\":\"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Work Package 1\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.cjvt.si\/povejmo\/en\/#website\",\"url\":\"https:\/\/www.cjvt.si\/povejmo\/en\/\",\"name\":\"PoVeJMo\",\"description\":\"Work site\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.cjvt.si\/povejmo\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Work Package 1: SloSBZ - PoVeJMo","robots":{"index":"noindex","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_US","og_type":"article","og_title":"Work Package 1: SloSBZ - PoVeJMo","og_url":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/","og_site_name":"PoVeJMo","article_modified_time":"2024-01-25T10:23:52+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/","url":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/","name":"Work Package 1: SloSBZ - PoVeJMo","isPartOf":{"@id":"https:\/\/www.cjvt.si\/povejmo\/en\/#website"},"datePublished":"2020-03-31T20:48:06+00:00","dateModified":"2024-01-25T10:23:52+00:00","breadcrumb":{"@id":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/work-package-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.cjvt.si\/povejmo\/o-programu\/"},{"@type":"ListItem","position":2,"name":"Work Packages","item":"https:\/\/www.cjvt.si\/povejmo\/en\/work-packages\/"},{"@type":"ListItem","position":3,"name":"Work Package 1"}]},{"@type":"WebSite","@id":"https:\/\/www.cjvt.si\/povejmo\/en\/#website","url":"https:\/\/www.cjvt.si\/povejmo\/en\/","name":"PoVeJMo","description":"Work site","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.cjvt.si\/povejmo\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/pages\/956","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/comments?post=956"}],"version-history":[{"count":5,"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/pages\/956\/revisions"}],"predecessor-version":[{"id":1642,"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/pages\/956\/revisions\/1642"}],"up":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/pages\/953"}],"wp:attachment":[{"href":"https:\/\/www.cjvt.si\/povejmo\/en\/wp-json\/wp\/v2\/media?parent=956"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}