{"id":2375,"date":"2020-06-01T15:17:12","date_gmt":"2020-06-01T13:17:12","guid":{"rendered":"https:\/\/www.cjvt.starkmat.si\/?page_id=2375"},"modified":"2020-11-05T22:33:22","modified_gmt":"2020-11-05T21:33:22","slug":"the-list-corpus-extraction-tool","status":"publish","type":"page","link":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/","title":{"rendered":"The LIST corpus extraction tool"},"content":{"rendered":"<div class='flex_column_table av-equal-height-column-flextable -flextable' style='margin-top:0px; margin-bottom:0px; '><div class=\"flex_column av_two_third  flex_column_table_cell av-equal-height-column av-align-middle av-zero-column-padding first  avia-builder-el-0  el_before_av_one_third  avia-builder-el-first  \" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h2>LIST<\/h2>\n<h3>EFFICIENT SLOVENE CORPUS ANALYSIS TOOL<\/h3>\n<\/div><\/section><\/div>\n<div class='av-flex-placeholder'><\/div><div class=\"flex_column av_one_third  flex_column_table_cell av-equal-height-column av-align-middle av-zero-column-padding   avia-builder-el-2  el_after_av_two_third  el_before_av_two_third  \" style='border-radius:0px; '><p><div  class='avia-button-wrap avia-button-left  avia-builder-el-3  el_before_av_button  avia-builder-el-first  gumb-sodelavci-levo' title=\"Keywords and n-grams from a textbook corpus\"><a href='https:\/\/www.cjvt.si\/en\/infrastructure-support\/lists-textbooks\/'  class='avia-button  av-button-notext   avia-icon_select-yes-left-icon avia-color-theme-color avia-size-small avia-position-left '   ><span class='avia_button_icon avia_button_icon_left ' aria-hidden='true' data-av_icon='\ue87c' data-av_iconfont='entypo-fontello'><\/span><span class='avia_iconbox_title' ><\/span><\/a><\/div><br \/>\n<div  class='avia-button-wrap avia-button-left  avia-builder-el-4  el_after_av_button  avia-builder-el-last  gumb-sodelavci-desno' title=\"The TOLMA\u010c tool\"><a href='https:\/\/www.cjvt.si\/en\/infrastructure-support\/tolmac\/'  class='avia-button  av-button-notext   avia-icon_select-yes-right-icon avia-color-theme-color avia-size-small avia-position-left '   ><span class='avia_iconbox_title' ><\/span><span class='avia_button_icon avia_button_icon_right' aria-hidden='true' data-av_icon='\ue87d' data-av_iconfont='entypo-fontello'><\/span><\/a><\/div><\/p><\/div><\/div><!--close column table wrapper. Autoclose: 1 --><div class=\"flex_column av_two_third  flex_column_div av-zero-column-padding first  avia-builder-el-5  el_after_av_one_third  el_before_av_one_third  column-top-margin\" style='margin-top:36px; margin-bottom:0px; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><!--Prijavitelja: Marko Robnik \u0160ikonja, \u0160pela Arhar Holdt, UL FRI--><\/p>\n<p>In this project, a clear and understandable user interface for the corpusStatistics tool (renamed to LIST) was developed. It enables its users to easily access language statistics in Slovene and other corpora. The tool was adapted to several formats and tested on large Slovene and other corpora.<\/p>\n<p>Metadata was added to the output which enables repeatability. The interface elements now have short explanations that show up while hovering over them.<\/p>\n<p>An option to show different collocation calculations (eg. Dice, t-score, MI, MI3) for the extracted word sets was added. Additionally, a calculation of the processing time was added. Furthermore, we implemented warnings for options that affect processing time.<\/p>\n<p>It is also possible to switch between the Slovene and English version. What is more, non-latin alphabets can be processed.<\/p>\n<p>The program was upgraded to support the TEI P5 format that is used for new corpora in the CLARIN-SI repository as well as the Vert format used in SkentchEngine.<\/p>\n<p>The LIST tool is available under the open license Apache2 at:<\/p>\n<p>Krsnik, Luka; et al., 2019, Corpus extraction tool LIST 1.0, Slovenian language resource repository CLARIN.SI, <a href=\"http:\/\/hdl.handle.net\/11356\/1227\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/hdl.handle.net\/11356\/1227<\/a>.<\/p>\n<\/div><\/section><\/div><\/p>\n<div class=\"flex_column av_one_third  flex_column_div   avia-builder-el-7  el_after_av_two_third  avia-builder-el-last  column-top-margin\" style='margin-top:36px; margin-bottom:0px; background: #f0f0f0; padding:30px; background-color:#f0f0f0; border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3 class=\"zn_text_box-title zn_text_box-title--style1 text-custom\">LINKS AND CONTACT<\/h3>\n<\/div><\/section><br \/>\n<div  style='height:20px' class='hr hr-invisible   avia-builder-el-9  el_after_av_textblock  el_before_av_textblock '><span class='hr-inner ' ><span class='hr-inner-style'><\/span><\/span><\/div><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/CreativeWork\" ><div class='avia_textblock  '   itemprop=\"text\" ><p>Jaka \u010cibej<br \/>\nCentre for Language Resources and Technologies at the University of Ljubljana<br \/>\nFaculty of Computer and Information Science UL<br \/>\nVe\u010dna pot 113, SI-1000 Ljubljana<\/p>\n<ul>\n<li>e-mail: <a href=\"mail&#116;&#111;&#58;&#106;&#97;&#107;&#97;&#x2e;&#x63;&#x69;&#x62;&#x65;&#x6a;&#x40;&#x63;&#x6a;vt&#46;s&#105;\">&#106;&#x61;k&#97;&#x2e;&#99;&#x69;b&#101;&#x6a;&#64;&#x63;&#x6a;&#118;&#x74;&#46;&#115;&#x69;<\/a><\/li>\n<\/ul>\n<\/div><\/section><br \/>\n<div  class='avia-button-wrap avia-button-right  avia-builder-el-11  el_after_av_textblock  avia-builder-el-last ' ><a href='http:\/\/hdl.handle.net\/11356\/1227'  class='avia-button   avia-icon_select-no avia-color-theme-color avia-size-medium avia-position-right '   ><span class='avia_iconbox_title' >LIST<\/span><\/a><\/div><\/p><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":3,"featured_media":0,"parent":985,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","inline_featured_image":false,"episode_type":"","audio_file":"","podmotor_file_id":"","podmotor_episode_id":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","filesize_raw":"","date_recorded":"","explicit":"","block":"","itunes_episode_number":"","itunes_title":"","itunes_season_number":"","itunes_episode_type":"","footnotes":""},"class_list":["post-2375","page","type-page","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>The LIST corpus extraction tool - CJVT<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The LIST corpus extraction tool - CJVT\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/\" \/>\n<meta property=\"og:site_name\" content=\"CJVT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/centerzajezikovnevireintehnologije\" \/>\n<meta property=\"article:modified_time\" content=\"2020-11-05T21:33:22+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/infrastructure-support\\\/the-list-corpus-extraction-tool\\\/\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/infrastructure-support\\\/the-list-corpus-extraction-tool\\\/\",\"name\":\"The LIST corpus extraction tool - CJVT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#website\"},\"datePublished\":\"2020-06-01T13:17:12+00:00\",\"dateModified\":\"2020-11-05T21:33:22+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/infrastructure-support\\\/the-list-corpus-extraction-tool\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.cjvt.si\\\/en\\\/infrastructure-support\\\/the-list-corpus-extraction-tool\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/infrastructure-support\\\/the-list-corpus-extraction-tool\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Infrastructure Support\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/infrastructure-support\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"The LIST corpus extraction tool\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/\",\"name\":\"CJVT\",\"description\":\"Center za jezikovne vire in tehnologije\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#organization\",\"name\":\"CJVT\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/CJVT-logo-red.jpg\",\"contentUrl\":\"https:\\\/\\\/www.cjvt.si\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/CJVT-logo-red.jpg\",\"width\":1300,\"height\":683,\"caption\":\"CJVT\"},\"image\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/centerzajezikovnevireintehnologije\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The LIST corpus extraction tool - CJVT","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/","og_locale":"en_US","og_type":"article","og_title":"The LIST corpus extraction tool - CJVT","og_url":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/","og_site_name":"CJVT","article_publisher":"https:\/\/www.facebook.com\/centerzajezikovnevireintehnologije","article_modified_time":"2020-11-05T21:33:22+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/","url":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/","name":"The LIST corpus extraction tool - CJVT","isPartOf":{"@id":"https:\/\/www.cjvt.si\/en\/#website"},"datePublished":"2020-06-01T13:17:12+00:00","dateModified":"2020-11-05T21:33:22+00:00","breadcrumb":{"@id":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/the-list-corpus-extraction-tool\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.cjvt.si\/en\/"},{"@type":"ListItem","position":2,"name":"Infrastructure Support","item":"https:\/\/www.cjvt.si\/en\/infrastructure-support\/"},{"@type":"ListItem","position":3,"name":"The LIST corpus extraction tool"}]},{"@type":"WebSite","@id":"https:\/\/www.cjvt.si\/en\/#website","url":"https:\/\/www.cjvt.si\/en\/","name":"CJVT","description":"Center za jezikovne vire in tehnologije","publisher":{"@id":"https:\/\/www.cjvt.si\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.cjvt.si\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.cjvt.si\/en\/#organization","name":"CJVT","url":"https:\/\/www.cjvt.si\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.cjvt.si\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.cjvt.si\/wp-content\/uploads\/2020\/06\/CJVT-logo-red.jpg","contentUrl":"https:\/\/www.cjvt.si\/wp-content\/uploads\/2020\/06\/CJVT-logo-red.jpg","width":1300,"height":683,"caption":"CJVT"},"image":{"@id":"https:\/\/www.cjvt.si\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/centerzajezikovnevireintehnologije"]}]}},"_links":{"self":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/2375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/comments?post=2375"}],"version-history":[{"count":6,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/2375\/revisions"}],"predecessor-version":[{"id":3665,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/2375\/revisions\/3665"}],"up":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/pages\/985"}],"wp:attachment":[{"href":"https:\/\/www.cjvt.si\/en\/wp-json\/wp\/v2\/media?parent=2375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}