Analyzing materials in DH goes beyond the text and often incorporates multimodal materials. The application of vision-language models (VLMs) in DH, particularly for less-resource languages such as Slovenian, is unexplored, leaving a gap in the effective analysis of visual materials. While building VLMs from scratch requires large amounts of aligned language-image data, the challenge is to adapt the existing VLM models to specific domains using smaller specialized resources. For example, there is a pressing need to adapt LLMs and VLMs to challenging tasks such as OCR recognition/improvement, classification, and information retrieval to effectively support DH research.
We will adapt and apply the vision-language model (VLM) developed in T1.3 on several DH downstream tasks to be applied in WP5. We will fine-tune the model for historical image retrieval and analysis and OCR recognition by leveraging aligned multilingual historical corpora as additional training data to train specialized VLMs. The downstream task will include classifying types of visual material from historical documents, content analysis of specific sources (such as images in historical newspapers), and topics (e.g., investigating and extracting the thematic illustrations from the folkloristics domain of conflicts and the role of outlaw hero).