Collocations as a Basis for Language Description: Semantic and Temporal Perspectives

National project J6-8255

The main objective of the proposed project is to conduct basic research into semantic and temporal aspects of collocation, as well as statistics for measuring it, areas that have been so far largely neglected in Slovenian linguistics, and to some extent also internationally. The second objective is the development and a thorough linguistic evaluation of machine learning methods for analyses of the Slovene language and extraction of lexical information from corpora. By doing this we want to introduce into the Slovene research environment a closer cooperation and synergy between lexicography and linguistics on the one side, and computational linguistics and natural language processing on the other. The third objective is a systematic integration of the results, obtained from various user studies, into the development of project methods and tools, and the preparation of methodological descriptions for transferring project results into practice in order to ensure their optimal applicability.

The project will address four different aspects of collocation: statistics for measuring collocation, semantic sets or categories of collocation, the role of collocation as a distinguishing characteristic between semantically related words (e.g. synonyms), and the role of collocation in detecting semantic and related changes in the use of words over time.

In terms of Slovenian research, the originality is found in all aspects of the project, as the proposed research studies and tools for Slovene do not exist at this time; the timeliness and relevance of this research is particularly vital as there are several ongoing lexicographic and other projects in Slovenia that would benefit significantly from the project results. Each work package of the project is expected to bring important new knowledge to the field of language description, and also importantly influence approaches and analysis of collocation in other disciplines such as linguistics and language learning.

Several aspects of the proposed research are likely to be of interest to the international research community, and will contribute to the development of new research directions in Slovenia as well as in other language communities, particularly those with a morphologically rich language. With the project results, we also aim to further and stimulate research into theoretical and applied aspects of collocation, colligation, and multi-word expressions.