Nikola is a senior researcher at the Department of Knowledge Technologies at the Jožef Stefan Institute, and the Laboratory for Cognitive Modeling at the Faculty for Information and Communication Science, University of Ljubljana. He mostly works in the areas of natural language processing, computational linguistics and computational social science.
How to ensure that AI understands and speaks my small language as well?
In recent years improvements in the area of natural language understanding have been tremendous, primarily because of a very successful application of deep learning and self-supervision both on text and speech data. While most of the improvements were obtained on English thanks to huge quantities of raw language data and enormous processing power available, in this talk we tackle the question on how to ensure that languages with a smaller number of speakers, which surely includes Croatian, successfully follow these developments.
We will present the current state-of-the-art for processing Croatian in terms of the NLP processing pipeline CLASSLA-Stanza, the BERTić language model, and the ParlaSpeech-HR speech-to-text model, and will discuss different possible paths ahead. One of our main take-home messages is that intensive collaboration between industry, academia and the public sector is crucial for not allowing the small languages to fall behind drastically.