Challenge 1: Improving LLMs with linguistic resources and development of vision-language models

LLMs require large amounts of high-quality textual data for their training and fine-tuning for specific tasks. High-quality lexicographic data can help pretrain LLMs by producing different types of data, in particular knowledge graphs and raw text. The available information in lexicographic resources includes relations, information on sense distribution with definitions of word senses, cross-lingual connections, identification and description of idiomatic or figurative expressions, etc.

This information trove is not yet adequately utilized by LLMs but it could reduce their hallucination, improve their language proficiency in complex situations and less-resourced languages, and improve fine-tuning of LLMs for specific important tasks, such as commonsense reasoning and natural language inference.