Large Language Models and Lexicography Workshop 2024
Date: 8 October 2024
Location: Hotel Croatia, Cavtat, Croatia
Length: half-day (09:00 – 13:00)
The workshop is co-located with the Euralex 2024 congress. More information is available on the Euralex 2024 Workshops page.
Format: open call for papers/presentations/demos + invited talks
Audience size: 30 participants
Description of the agenda
The workshop will delve into the integration of large language models (LLMs) in lexicography. The workshop aims to explore how these models aid in linguistic analysis and generation of dictionary data, enhancing dictionary development through automation of processes.
The expected topics also include identifying new word usages and trends, and how LLMs facilitate multilingual lexicography, as well as the ethical implications of AI in lexicography, including concerns about bias and cultural sensitivity, or any other topics related to the use of LLMs in lexicography. The workshop will be of interest to lexicographers and language technology experts, offering insights into the trends of AI-assisted lexicography and preparing them for digital transformation.
Programme:
9:00-09:20 | Timotej Knez, Tim Prezelj and Slavko Žitnik | SemSex: Automated Assessment of Sex Education Representation in Slovene Curricula |
9:20-9:40 | Carole Tiberius, Kris Heylen, Bram Vanroy, Vincent Vandeghinste, Jesse de Does and Job van Doeselaar | LLMs and evidence-based lexicography: pilot studies at INT |
9:40-10:00 | Radovan Garabík and Vladimír Benko | An Experiment with LLM for Lexicography |
10:00-10:20 | Iztok Kosem, Polona Gantar, Špela Arhar Holdt, Magdalena Gapsa, Karolina Zgaga, Simon Krek | AI in lexicography at the University of Ljubljana: case studies |
10:20-10:40 | Marko Tadić | Can LLMs really generate new words? |
10:40-11:00 | Coffee break | |
11:00-11:20 | Carole Tiberius, Elisabetta Ježek, Annachiara Clementelli and Lut Colman | Automating Corpus Pattern Analysis: a cross-lingual pilot study for Dutch and Italian |
11:20-11:40 | Nataliia Cheilytko and Ruprecht von Waldenfels | Semantic Change and Lexical Variation in Ukrainian with Vector Representation and LLM |
11:40-12:00 | Ivana Filipović Petrović and Slobodan Beliga | Lexicographic treatment of idioms and large language models: what will rise to the surface? |
12:00-12:20 | Pauline Sander, Simon Hengchen, Wei Zhao, Xiaocheng Ma, Emma Sköldberg, Shafqat Virk and Dominik Schlechtweg | The DURel Annotation Tool: Using fine-tuned LLMs to discover non-recorded senses in multiple languages |
12:20-12:40 | Gilles-Maurice de Schryver | The road towards fine-tuned LLMs for lexicography |
12:40-13:00 | Discussion |
The deadline for the submission of extended abstracts with a maximum of 1500 words, excluding references, tables, and figures, is June 3rd, 2024 (new deadline: June 10th). Authors will be notified about acceptance/rejection by July 2nd, 2024. The deadline for the final submission of accepted abstracts is September 2nd, 2024 (new deadline: September 11th).
Accepted abstracts will be published in the Book of abstracts before the workshop.
Submission link: EasyChair (LLM-Lex 2024) abstract submission
Chair: Simon Krek
Organisation: Tina Munda (contact: tina.munda@cjvt.si)
Reviewers:
Polona Gantar, University of Ljubljana, Slovenia
Miloš Jakubíček, Lexical Computing, Brno, Czechia
Ilan Kernerman, Lexicala by K Dictionaries, Israel
Annette Klosa, Leibniz-Institut für Deutsche Sprache, Mannheim, Germany
Iztok Kosem, University of Ljubljana, Slovenia
Simon Krek, Jožef Stefan Institute, Ljubljana, Slovenia
Rusudan Makhachashvili, Borys Grinchenko Kyiv Metropolitan University, Ukraine
John McCrae, University of Galway, Ireland
Sanni Nimb, Society for Danish Language and Literature, Copenhagen, Denmark
Sussi Olsen, University of Copenhagen, Denmark
Bolette Sandford Pedersen, University of Copenhagen, Denmark
Ranka Stanković, University of Belgrade, Faculty of Mining and Geology, Serbia
Carole Tiberius, Instituut voor de Nederlandse Taal, Leiden. the Netherlands