Large Language Models and Lexicography Workshop 2024

Date: 8 October 2024

Location: Hotel Croatia, Cavtat, Croatia

Length: half-day (09:00 – 13:00)

The workshop is co-located with the Euralex 2024 congress. More information is available on the Euralex 2024 Workshops page.

Format: open call for papers/presentations/demos + invited talks
Audience size: 30 participants

Description of the agenda

The workshop will delve into the integration of large language models (LLMs) in lexicography. The workshop aims to explore how these models aid in linguistic analysis and generation of dictionary data, enhancing dictionary development through automation of processes.

The expected topics also include identifying new word usages and trends, and how LLMs facilitate multilingual lexicography, as well as the ethical implications of AI in lexicography, including concerns about bias and cultural sensitivity, or any other topics related to the use of LLMs in lexicography. The workshop will be of interest to lexicographers and language technology experts, offering insights into the trends of AI-assisted lexicography and preparing them for digital transformation.

Programme: 

9:00-09:20 Timotej Knez, Tim Prezelj and Slavko Žitnik SemSex: Automated Assessment of Sex Education Representation in Slovene Curricula
9:20-9:40 Carole Tiberius, Kris Heylen, Bram Vanroy, Vincent Vandeghinste, Jesse de Does and Job van Doeselaar LLMs and evidence-based lexicography: pilot studies at INT
9:40-10:00 Radovan Garabík and Vladimír Benko An Experiment with LLM for Lexicography
10:00-10:20 Iztok Kosem, Polona Gantar, Špela Arhar Holdt, Magdalena Gapsa, Karolina Zgaga, Simon Krek AI in lexicography at the University of Ljubljana: case studies
10:20-10:40 Marko Tadić Can LLMs really generate new words?
10:40-11:00 Coffee break
11:00-11:20 Carole Tiberius, Elisabetta Ježek, Annachiara Clementelli and Lut Colman Automating Corpus Pattern Analysis: a cross-lingual pilot study for Dutch and Italian
11:20-11:40 Nataliia Cheilytko and Ruprecht von Waldenfels Semantic Change and Lexical Variation in Ukrainian with Vector Representation and LLM
11:40-12:00 Ivana Filipović Petrović and Slobodan Beliga  Lexicographic treatment of idioms and large language models: what will rise to the surface?
12:00-12:20 Pauline Sander, Simon Hengchen, Wei Zhao, Xiaocheng Ma, Emma Sköldberg, Shafqat Virk and Dominik Schlechtweg The DURel Annotation Tool: Using fine-tuned LLMs to discover non-recorded senses in multiple languages
12:20-12:40 Gilles-Maurice de Schryver The road towards fine-tuned LLMs for lexicography
12:40-13:00 Discussion

The deadline for the submission of extended abstracts with a maximum of 1500 words, excluding references, tables, and figures, is June 3rd, 2024 (new deadline: June 10th). Authors will be notified about acceptance/rejection by July 2nd, 2024. The deadline for the final submission of accepted abstracts is September 2nd, 2024 (new deadline: September 11th).

Accepted abstracts will be published in the Book of abstracts before the workshop.

Submission link: EasyChair (LLM-Lex 2024) abstract submission


Chair: Simon Krek

Organisation: Tina Munda (contact: tina.munda@cjvt.si)


Reviewers:

Polona Gantar, University of Ljubljana, Slovenia

Miloš Jakubíček, Lexical Computing, Brno, Czechia

Ilan Kernerman, Lexicala by K Dictionaries, Israel

Annette Klosa, Leibniz-Institut für Deutsche Sprache, Mannheim, Germany

Iztok Kosem, University of Ljubljana, Slovenia

Simon Krek, Jožef Stefan Institute, Ljubljana, Slovenia

Rusudan Makhachashvili, Borys Grinchenko Kyiv Metropolitan University, Ukraine

John McCrae, University of Galway, Ireland

Sanni Nimb, Society for Danish Language and Literature, Copenhagen, Denmark

Sussi Olsen, University of Copenhagen, Denmark

Bolette Sandford Pedersen, University of Copenhagen, Denmark

Ranka Stanković, University of Belgrade, Faculty of Mining and Geology, Serbia

Carole Tiberius, Instituut voor de Nederlandse Taal, Leiden. the Netherlands