{"id":1700,"date":"2025-07-16T09:51:11","date_gmt":"2025-07-16T07:51:11","guid":{"rendered":"https:\/\/www.cjvt.si\/llm4dh\/?p=1700"},"modified":"2025-07-16T15:01:14","modified_gmt":"2025-07-16T13:01:14","slug":"1700","status":"publish","type":"post","link":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/","title":{"rendered":"Improving Linguistic Data with LLMs"},"content":{"rendered":"<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-0  el_before_av_one_full  avia-builder-el-first  \" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h1>Improving Linguistic Data with LLMs<\/h1>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-2  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>By dr. Slavko \u017ditnik and Timotej Knez<\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">Large language models (LLMs) are revolutionising the way we access information, communicate and work. In addition to everyday applications, LLMs are also reshaping scientific fields such as language studies, humanities and social sciences. However, although their capabilities are diverse, LLMs still have their limitations: they can provide inconsistent or incorrect answers, require significant computational resources, perform poorly on less-resourced languages and struggle with tasks involving social understanding, ethics and human needs.<\/span><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-5  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>Addressing the Research Challenge<\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">A promising way to improve LLM performance is to use high-quality lexicographic data. Such data can support LLM pre-training by providing both raw text and structured information, including synonymy, antonymy, hyponymy, hypernymy, meronymy, holonymy, sense distributions, idiomatic expressions and cross-linguistic distributions. Despite its potential, this rich linguistic knowledge is not yet fully utilised in existing LLMs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By integrating this type of data into the development of LLMs, we can reduce hallucinations, improve language proficiency in complex contexts, and strengthen fine-tuning for tasks such as commonsense reasoning and natural language inference. Our project focuses on applying this approach to Slovenian \u2014 a less-resourced, morphologically rich language that lacks the digital, educational, and institutional support that global languages such as English enjoy.<\/span><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-8  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><b>Our Approach: Extracting Knowledge Graphs from Lexical Resources<\/b><\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages. Specifically, we applied this methodology to the Digital Dictionary Database for Slovene (DDD), the largest freely accessible lexical-lexicographical resource for Slovene, and several other structured Slovene lexicographic resources.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The resulting corpus comprises 356,294 words and was built from single-lexeme entries sourced from the DDD. Only individual words (no multi-word expressions) were included to ensure a clear lexical focus. For each word, all morphological forms were listed using data from DDD. Definitions of word senses were collected from multiple sources\u2014SSKJ, sloWnet, and the Bridge Dictionary\u2014while semantic indicators from DDD were used when full definitions weren\u2019t available. Usage examples were included where present (from SSJK and DDD). Common collocations were added based on DDD data. Synonyms were sourced from a dedicated synonyms dictionary, often grouped by word sense and labeled with semantic indicators where possible (see Picture 1).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The corpus is saved as a structured markdown file, designed for both human readability and machine parsing and is freely available for everyone to use.<\/span><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-11  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><div  class='avia-image-container  av-styling-    avia-builder-el-12  avia-builder-el-no-sibling  avia-align-center '  itemprop=\"image\" itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/ImageObject\"  ><div class='avia-image-container-inner'><div class='avia-image-overlay-wrap'><img class='avia_image' src='https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1-845x684.png' alt='' title='LLM4DH novica_slika (1)' height=\"684\" width=\"845\"  itemprop=\"thumbnailUrl\"  \/><\/div><\/div><\/div><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-13  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p style=\"text-align: left;\"><em><span style=\"font-weight: 400;\">Figure 1: Snapshot of data extracted from structured resources. Example entry for word jagoda (strawberry).<\/span><\/em><\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-15  el_after_av_one_full  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">This dataset opens up use cases for additional pretraining of large language models &#8211; improving lexical analysis and semantic modelling. Following the completion of this resource, the next steps of our work are already underway. An initial improved LLM will be developed by month 12, followed by a final, refined model by month 24.<\/span><\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding first  avia-builder-el-17  el_after_av_one_full  el_before_av_one_half  column-top-margin\" style='border-radius:0px; '><\/div>\n<div class=\"flex_column av_one_half  flex_column_div av-zero-column-padding   avia-builder-el-18  el_after_av_one_fourth  el_before_av_one_fourth  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><b>Corpus Metadata<\/b><\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Corpus name:<\/b><span style=\"font-weight: 400;\"> Lexical LLM Pretraining Corpus<\/span>&nbsp;<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>File:<\/b> <a href=\"https:\/\/unilj-my.sharepoint.com\/:f:\/g\/personal\/slavkozitnik_fri1_uni-lj_si\/EgM_Fv_cAM5FntmhrLZ0wCcBqZM4l2IaOsuaFTQB1Hy5fg?e=PJXAMB\"><span style=\"font-weight: 400;\">D1.1.1 &#8211; pretraining corpus<\/span><\/a>&nbsp;<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Total entries:<\/b><span style=\"font-weight: 400;\"> 356,294 words<\/span>&nbsp;<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Language:<\/b><span style=\"font-weight: 400;\"> Slovene<\/span>&nbsp;<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Sources:<\/b><span style=\"font-weight: 400;\"> DDS, SSKJ, sloWnet, Bridge Dictionary, Synonyms Dictionary<\/span>&nbsp;<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Content:<\/b><span style=\"font-weight: 400;\"> Lemmas, forms, senses, definitions or indicators, examples, collocations, synonyms<\/span>&nbsp;<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Format:<\/b><span style=\"font-weight: 400;\"> Markdown, structured with separators<\/span>&nbsp;<\/li>\n<\/ul>\n<p><b>Use cases:<\/b><span style=\"font-weight: 400;\"> LLM pretraining, lexical analysis, semantic modeling<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_fourth  flex_column_div av-zero-column-padding   avia-builder-el-21  el_after_av_one_half  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-22  el_after_av_one_fourth  avia-builder-el-last  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><b>Citation:\u00a0<\/b><\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">\u017ditnik, S. and Knez, T. (2025). Improving Linguistic Data with LLMs. Zenodo. <\/span><a href=\"https:\/\/doi.org\/10.5281\/zenodo.15878672\"><span style=\"font-weight: 400;\">https:\/\/doi.org\/10.5281\/zenodo.15878672<\/span><\/a><\/p>\n<\/div><\/section><\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.<\/p>\n","protected":false},"author":19,"featured_media":1701,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","inline_featured_image":false,"episode_type":"","audio_file":"","podmotor_file_id":"","podmotor_episode_id":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","filesize_raw":"","date_recorded":"","explicit":"","block":"","itunes_episode_number":"","itunes_title":"","itunes_season_number":"","itunes_episode_type":"","footnotes":""},"categories":[84],"tags":[],"class_list":["post-1700","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-posts"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Improving Linguistic Data with LLMs - LLM4DH<\/title>\n<meta name=\"description\" content=\"We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Improving Linguistic Data with LLMs - LLM4DH\" \/>\n<meta property=\"og:description\" content=\"We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/\" \/>\n<meta property=\"og:site_name\" content=\"LLM4DH\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-16T07:51:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-16T13:01:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1129\" \/>\n\t<meta property=\"og:image:height\" content=\"868\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"saras\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"saras\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/\"},\"author\":{\"name\":\"saras\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#\\\/schema\\\/person\\\/4d451cdaaa7aa1b00f756029e4b54aa7\"},\"headline\":\"Improving Linguistic Data with LLMs\",\"datePublished\":\"2025-07-16T07:51:11+00:00\",\"dateModified\":\"2025-07-16T13:01:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/\"},\"wordCount\":1813,\"image\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/07\\\/LLM4DH-novica_slika-1.png\",\"articleSection\":[\"Blog Posts\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/\",\"name\":\"Improving Linguistic Data with LLMs - LLM4DH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/07\\\/LLM4DH-novica_slika-1.png\",\"datePublished\":\"2025-07-16T07:51:11+00:00\",\"dateModified\":\"2025-07-16T13:01:14+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#\\\/schema\\\/person\\\/4d451cdaaa7aa1b00f756029e4b54aa7\"},\"description\":\"We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/07\\\/LLM4DH-novica_slika-1.png\",\"contentUrl\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/07\\\/LLM4DH-novica_slika-1.png\",\"width\":1129,\"height\":868},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/mproving-linguistic-data-with-llms\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Improving Linguistic Data with LLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/\",\"name\":\"LLM4DH\",\"description\":\"Work site\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#\\\/schema\\\/person\\\/4d451cdaaa7aa1b00f756029e4b54aa7\",\"name\":\"saras\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/blog\\\/author\\\/saras\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Improving Linguistic Data with LLMs - LLM4DH","description":"We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/","og_locale":"en_US","og_type":"article","og_title":"Improving Linguistic Data with LLMs - LLM4DH","og_description":"We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.","og_url":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/","og_site_name":"LLM4DH","article_published_time":"2025-07-16T07:51:11+00:00","article_modified_time":"2025-07-16T13:01:14+00:00","og_image":[{"width":1129,"height":868,"url":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1.png","type":"image\/png"}],"author":"saras","twitter_card":"summary_large_image","twitter_misc":{"Written by":"saras","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#article","isPartOf":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/"},"author":{"name":"saras","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#\/schema\/person\/4d451cdaaa7aa1b00f756029e4b54aa7"},"headline":"Improving Linguistic Data with LLMs","datePublished":"2025-07-16T07:51:11+00:00","dateModified":"2025-07-16T13:01:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/"},"wordCount":1813,"image":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#primaryimage"},"thumbnailUrl":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1.png","articleSection":["Blog Posts"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/","url":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/","name":"Improving Linguistic Data with LLMs - LLM4DH","isPartOf":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#primaryimage"},"image":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#primaryimage"},"thumbnailUrl":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1.png","datePublished":"2025-07-16T07:51:11+00:00","dateModified":"2025-07-16T13:01:14+00:00","author":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#\/schema\/person\/4d451cdaaa7aa1b00f756029e4b54aa7"},"description":"We have developed a novel methodology for extracting knowledge graphs from digital linguistic databases that is tailored to morphologically complex languages.","breadcrumb":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#primaryimage","url":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1.png","contentUrl":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/07\/LLM4DH-novica_slika-1.png","width":1129,"height":868},{"@type":"BreadcrumbList","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/mproving-linguistic-data-with-llms\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.cjvt.si\/llm4dh\/en\/"},{"@type":"ListItem","position":2,"name":"Improving Linguistic Data with LLMs"}]},{"@type":"WebSite","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#website","url":"https:\/\/www.cjvt.si\/llm4dh\/en\/","name":"LLM4DH","description":"Work site","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.cjvt.si\/llm4dh\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#\/schema\/person\/4d451cdaaa7aa1b00f756029e4b54aa7","name":"saras","url":"https:\/\/www.cjvt.si\/llm4dh\/en\/blog\/author\/saras\/"}]}},"_links":{"self":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts\/1700","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/comments?post=1700"}],"version-history":[{"count":12,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts\/1700\/revisions"}],"predecessor-version":[{"id":1710,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts\/1700\/revisions\/1710"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/media\/1701"}],"wp:attachment":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/media?parent=1700"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/categories?post=1700"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/tags?post=1700"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}