{"id":1689,"date":"2025-06-12T11:50:30","date_gmt":"2025-06-12T09:50:30","guid":{"rendered":"https:\/\/www.cjvt.si\/llm4dh\/?p=1689"},"modified":"2025-06-12T11:51:35","modified_gmt":"2025-06-12T09:51:35","slug":"1689","status":"publish","type":"post","link":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/","title":{"rendered":"Advanced grammatical analysis of multilingual corpora"},"content":{"rendered":"<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-0  el_before_av_one_full  avia-builder-el-first  \" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h1><b>Advanced grammatical analysis of multilingual corpora<\/b><\/h1>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-2  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><span style=\"font-weight: 400;\">By Matej Klemen<\/span><\/h3>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-4  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">In recent times, linguistics has seen a transition from intuition-based research to data-driven approaches, fueled by the advent of large-scale corpora and advanced computational tools.This shift has led to significant new discoveries about language structure and use, particularly in the field of descriptive and comparative grammar analysis.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, traditional corpus-based methods remain labour-intensive and often involve the use of a specialised tool or creation of a programming script, adding complexity and requiring additional non-linguistic expertise from the end user. The emergence of LLMs with reasoning capabilities offers an opportunity to simplify the workflow, as well as enhance linguistic analysis, potentially uncovering previously unidentified patterns of language use.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data. The UD initiative (de Marneffe et al. 2021) has created the largest grammatically annotated dataset to date, encompassing treebanks for more than 160 languages worldwide, and has been instrumental in advancing research in linguistic typology (e.g., Levshina 2022) and other grammar-related disciplines. UD focuses on syntactic structure: how words in a sentence are connected and what grammatical roles they play \u2014 like subject, object, modifier, etc. The goal is to enable analysis and comparison of sentence structure across many different languages using the same system of labels and rules.\u00a0<\/span><span style=\"font-weight: 400;\">To illustrate the idea, we will use the question about the (dominant) word order in a language as an example. The goal is to determine the most frequently occurring word order in terms of the <\/span><b>S<\/b><span style=\"font-weight: 400;\">ubject (e.g., <\/span><i><span style=\"font-weight: 400;\">I<\/span><\/i><span style=\"font-weight: 400;\">), <\/span><b>V<\/b><span style=\"font-weight: 400;\">erb (e.g., <\/span><i><span style=\"font-weight: 400;\">ate<\/span><\/i><span style=\"font-weight: 400;\">), and <\/span><b>O<\/b><span style=\"font-weight: 400;\">bject (<\/span><i><span style=\"font-weight: 400;\">a pie<\/span><\/i><span style=\"font-weight: 400;\">). Figure 1 illustrates the contrast between a traditional approach to answering the question and our proposed approach.<\/span><\/p>\n<\/div><\/section><\/div><div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-6  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><div  class='avia-image-container  av-styling-    avia-builder-el-7  avia-builder-el-no-sibling  avia-align-center '  itemprop=\"image\" itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/ImageObject\"  ><div class='avia-image-container-inner'><div class='avia-image-overlay-wrap'><img class='avia_image' src='https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure1.png' alt='' title='Figure1' height=\"416\" width=\"768\"  itemprop=\"thumbnailUrl\"  \/><\/div><\/div><\/div><\/div><\/p>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-8  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p style=\"text-align: right;\"><em><span style=\"font-weight: 400;\">Figure 1: A schema illustrating the contrast in answering the dominant word order query using a traditional approach (top) and our proposed approach (bottom). In our system, the data selection and intermediate analysis is done \u201cunder the hood\u201d, saving the user time spent setting up a workflow. However, the intermediate steps performed are also available to the user, enabling their inspection and potential correction.<\/span><\/em><\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-10  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>What we aim to do<\/h3>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-12  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">We will develop an LLM-based system for grammatical analysis of multilingual corpora. Given a user query, the system will automatically determine the steps required to answer the query, and execute them. This involves tasks such as retrieval of relevant examples, performing analysis on relevant examples (extracting patterns), and summarising the results.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the dominant word order example for Slovene:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the <\/span><b>retrieval<\/b><span style=\"font-weight: 400;\"> step would select only examples in Slovene containing a subject, verb, and an object;\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the <\/span><b>pattern extraction<\/b><span style=\"font-weight: 400;\"> step would analyse the retrieved examples and count the occurrences for each of the possible word orders (SVO, SOV, VOS, \u2026);<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the <\/span><b>summarisation<\/b><span style=\"font-weight: 400;\"> step would provide a short answer and mention potential deviations, potentially displaying illustrative examples alongside.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Our goal is to create a system capable of answering diverse queries of varying complexity about languages contained in the massively multilingual UD dataset. Using the system we will conduct linguistic research on corpus data and evaluate its potential for advancing our knowledge of the grammar of the world&#8217;s languages.<\/span><\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-14  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><b>How?<\/b><\/h3>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-16  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><p><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">Figure 2 shows an illustration of the initial system design on the dominant word order example. We will build our system using two key techniques:\u00a0<\/span><\/p>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Retrieval-augmented generation (RAG)<\/b><span style=\"font-weight: 400;\">. Although LLMs already contain some knowledge of linguistic concepts, they are based on unconstrained and potentially unreliable resources. RAG enables the LLMs to perform analysis on a data subset retrieved from a known and trustworthy resource. In Figure 2, the system retrieves a subset of UD data containing only examples relevant to the dominant word order query<\/span><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Function (tool) calling*<\/b>. Most off-the-shelf LLMs are convenient generalist question responders, meaning they can be used to answer a diverse set of questions, but often with limited accuracy and reliability. On the other hand, specialised analysis tools exist, but their use requires specialised tool-handling skills. Function calling enables the LLMs to learn when and how to call specialised tools, fusing the best of both worlds: the convenience of querying LLMs and the reliability of specialised analysis tools. In Figure 2, the system calls the specialised STARK analysis tool instead of performing the analysis using an LLM.<\/li>\n<\/ul>\n<p><i><span style=\"font-weight: 400;\">*The implementation of function calling is planned in the next iteration of our system.<\/span><\/i><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-19  el_after_av_one_full  el_before_av_one_fifth  column-top-margin\" style='border-radius:0px; '><div  class='avia-image-container  av-styling-    avia-builder-el-20  avia-builder-el-no-sibling  avia-align-center '  itemprop=\"image\" itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/ImageObject\"  ><div class='avia-image-container-inner'><div class='avia-image-overlay-wrap'><img class='avia_image' src='https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png' alt='' title='Figure2' height=\"353\" width=\"785\"  itemprop=\"thumbnailUrl\"  \/><\/div><\/div><\/div><\/div>\n<div class=\"flex_column av_one_fifth  flex_column_div av-zero-column-padding first  avia-builder-el-21  el_after_av_one_full  el_before_av_four_fifth  column-top-margin\" style='border-radius:0px; '><\/div>\n<div class=\"flex_column av_four_fifth  flex_column_div av-zero-column-padding   avia-builder-el-22  el_after_av_one_fifth  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p style=\"text-align: right;\"><em><span style=\"font-weight: 400;\">Figure 2: Initial design of the system for advanced grammatical analysis of multilingual corpora.<\/span><\/em><\/p>\n<\/div><\/section><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-24  el_after_av_four_fifth  el_before_av_one_full  \" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><b>Initial challenges<\/b><\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><b>Retrieval of relevant examples. <\/b><span style=\"font-weight: 400;\">Relevance is a general and sometimes vague term. LLMs might not have an aligned definition of relevance to the user. The system shall provide a clearer definition of the relevance criteria based on the user query. This will improve the retrieval accuracy and the transparency of the process for the user.<\/span><\/p>\n<p><b>Accuracy of off-the-shelf LLMs for linguistic tasks.<\/b><span style=\"font-weight: 400;\"> In our initial experiments with linguistic queries from the World Atlas of Language Structures (<\/span><a href=\"https:\/\/wals.info\/\"><span style=\"font-weight: 400;\">WALS<\/span><\/a><span style=\"font-weight: 400;\">) we have observed that LLMs struggle with grammatical tasks. To improve this, we will work on improved methods for including external knowledge from UD, and continue testing new and constantly improving models.<\/span><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-27  el_after_av_one_full  el_before_av_one_full  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3><b>Citation:\u00a0<\/b><\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">Klemen, M. (2025). Advanced grammatical analysis of multilingual corpora. Zenodo. <\/span><a href=\"https:\/\/doi.org\/10.5281\/zenodo.15646857\"><span style=\"font-weight: 400;\">https:\/\/doi.org\/10.5281\/zenodo.15646857<\/span><\/a><\/p>\n<\/div><\/section><\/p><\/div>\n<div class=\"flex_column av_one_full  flex_column_div av-zero-column-padding first  avia-builder-el-30  el_after_av_one_full  avia-builder-el-last  column-top-margin\" style='border-radius:0px; '><section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><h3>References:<\/h3>\n<\/div><\/section><br \/>\n<section class=\"av_textblock_section \"  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock  '   itemprop=\"text\" ><p><span style=\"font-weight: 400;\">De Marneffe, M. C., Manning, C. D., Nivre, J., &amp; Zeman, D. (2021). Universal dependencies. Computational linguistics, 47(2), 255-308.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Levshina, N. (2022). Corpus-based typology: Applications, challenges and some solutions. Linguistic Typology, 26(1), 129-160.<\/span><\/p>\n<\/div><\/section><\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>As part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.<\/p>\n","protected":false},"author":19,"featured_media":1678,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","inline_featured_image":false,"episode_type":"","audio_file":"","podmotor_file_id":"","podmotor_episode_id":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","filesize_raw":"","date_recorded":"","explicit":"","block":"","itunes_episode_number":"","itunes_title":"","itunes_season_number":"","itunes_episode_type":"","footnotes":""},"categories":[84],"tags":[],"class_list":["post-1689","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-posts"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Advanced grammatical analysis of multilingual corpora - LLM4DH<\/title>\n<meta name=\"description\" content=\"Advanced grammatical analysis of multilingual corporaAs part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Advanced grammatical analysis of multilingual corpora - LLM4DH\" \/>\n<meta property=\"og:description\" content=\"Advanced grammatical analysis of multilingual corporaAs part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/\" \/>\n<meta property=\"og:site_name\" content=\"LLM4DH\" \/>\n<meta property=\"article:published_time\" content=\"2025-06-12T09:50:30+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-12T09:51:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"785\" \/>\n\t<meta property=\"og:image:height\" content=\"353\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"saras\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"saras\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/\"},\"author\":{\"name\":\"saras\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#\\\/schema\\\/person\\\/4d451cdaaa7aa1b00f756029e4b54aa7\"},\"headline\":\"Advanced grammatical analysis of multilingual corpora\",\"datePublished\":\"2025-06-12T09:50:30+00:00\",\"dateModified\":\"2025-06-12T09:51:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/\"},\"wordCount\":2634,\"image\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/06\\\/Figure2.png\",\"articleSection\":[\"Blog Posts\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/\",\"name\":\"Advanced grammatical analysis of multilingual corpora - LLM4DH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/06\\\/Figure2.png\",\"datePublished\":\"2025-06-12T09:50:30+00:00\",\"dateModified\":\"2025-06-12T09:51:35+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#\\\/schema\\\/person\\\/4d451cdaaa7aa1b00f756029e4b54aa7\"},\"description\":\"Advanced grammatical analysis of multilingual corporaAs part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/06\\\/Figure2.png\",\"contentUrl\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/wp-content\\\/uploads\\\/sites\\\/32\\\/2025\\\/06\\\/Figure2.png\",\"width\":785,\"height\":353},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/advanced-grammatical-analysis-of-multilingual-corpora\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Advanced grammatical analysis of multilingual corpora\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/\",\"name\":\"LLM4DH\",\"description\":\"Work site\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/#\\\/schema\\\/person\\\/4d451cdaaa7aa1b00f756029e4b54aa7\",\"name\":\"saras\",\"url\":\"https:\\\/\\\/www.cjvt.si\\\/llm4dh\\\/en\\\/blog\\\/author\\\/saras\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Advanced grammatical analysis of multilingual corpora - LLM4DH","description":"Advanced grammatical analysis of multilingual corporaAs part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/","og_locale":"en_US","og_type":"article","og_title":"Advanced grammatical analysis of multilingual corpora - LLM4DH","og_description":"Advanced grammatical analysis of multilingual corporaAs part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.","og_url":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/","og_site_name":"LLM4DH","article_published_time":"2025-06-12T09:50:30+00:00","article_modified_time":"2025-06-12T09:51:35+00:00","og_image":[{"width":785,"height":353,"url":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png","type":"image\/png"}],"author":"saras","twitter_card":"summary_large_image","twitter_misc":{"Written by":"saras","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#article","isPartOf":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/"},"author":{"name":"saras","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#\/schema\/person\/4d451cdaaa7aa1b00f756029e4b54aa7"},"headline":"Advanced grammatical analysis of multilingual corpora","datePublished":"2025-06-12T09:50:30+00:00","dateModified":"2025-06-12T09:51:35+00:00","mainEntityOfPage":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/"},"wordCount":2634,"image":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#primaryimage"},"thumbnailUrl":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png","articleSection":["Blog Posts"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/","url":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/","name":"Advanced grammatical analysis of multilingual corpora - LLM4DH","isPartOf":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#primaryimage"},"image":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#primaryimage"},"thumbnailUrl":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png","datePublished":"2025-06-12T09:50:30+00:00","dateModified":"2025-06-12T09:51:35+00:00","author":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#\/schema\/person\/4d451cdaaa7aa1b00f756029e4b54aa7"},"description":"Advanced grammatical analysis of multilingual corporaAs part of the LLM4DH project, we will develop a novel approach to grammatical analysis of multilingual corpora by augmenting state-of-the-art LLMs with the Universal Dependencies (UD) data.","breadcrumb":{"@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#primaryimage","url":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png","contentUrl":"https:\/\/www.cjvt.si\/llm4dh\/wp-content\/uploads\/sites\/32\/2025\/06\/Figure2.png","width":785,"height":353},{"@type":"BreadcrumbList","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/advanced-grammatical-analysis-of-multilingual-corpora\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.cjvt.si\/llm4dh\/en\/"},{"@type":"ListItem","position":2,"name":"Advanced grammatical analysis of multilingual corpora"}]},{"@type":"WebSite","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#website","url":"https:\/\/www.cjvt.si\/llm4dh\/en\/","name":"LLM4DH","description":"Work site","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.cjvt.si\/llm4dh\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.cjvt.si\/llm4dh\/en\/#\/schema\/person\/4d451cdaaa7aa1b00f756029e4b54aa7","name":"saras","url":"https:\/\/www.cjvt.si\/llm4dh\/en\/blog\/author\/saras\/"}]}},"_links":{"self":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts\/1689","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/comments?post=1689"}],"version-history":[{"count":3,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts\/1689\/revisions"}],"predecessor-version":[{"id":1693,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/posts\/1689\/revisions\/1693"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/media\/1678"}],"wp:attachment":[{"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/media?parent=1689"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/categories?post=1689"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cjvt.si\/llm4dh\/en\/wp-json\/wp\/v2\/tags?post=1689"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}