AI2FUTURE2023 – 2 days, 50+ moving & inspiring discussion and presentations

19th & 20th of October 2023, Kraš Auditorium, Zagreb, Croatia
https://i1.wp.com/ai2future.com/wp-content/uploads/2023/08/matko-bosnjak.jpeg?fit=320%2C303&ssl=1
Google DeepMind

Matko Bošnjak

Senior Research Scientist

Matko is a Senior Research Scientist at Google DeepMind, dedicating his work to addressing fundamental challenges in Machine Learning and Artificial Intelligence. He holds a PhD in Computer Science from the University College London and a Dipl.Ing degree (MS equivalent) from the University of Zagreb, gaining extensive research experience on the way  across both academic and industry roles. These include research positions at the University of Porto and the Ruđer Bošković Institute, and industry positions at Google DeepMind and Microsoft.

Matko’s current research interests include representation learning, concept learning and scene understanding in vision-language models. In the past he also worked on graph neural networks, algorithmic priors, and neuro-symbolic computation and reasoning, as well as applied machine learning in natural language processing, microblogging, recommender systems, computational biology, bioinformatics and medicine. He consistently publishes his research findings in top-tier machine learning conferences and journals.

In addition to his technical contributions, Matko is actively engaged in knowledge dissemination in the area of Artificial Intelligence via mentoring, science education and popularisation, and has taught Machine Learning and Artificial Intelligence courses in the past.


Matko’s

Idea

Demystifying Visual Language Models

Large Language Models (LLMs) have undoubtedly achieved remarkable successes in natural language understanding and generation. Almost equally impressive is the fact that their accomplishments are constrained within the sole domain of natural language, completely ignoring the rich world of visual perception. This restriction limits their usefulness to language-related tasks and uses.

Visual Language Models (VLMs) address this limitation by integrating both text and visual inputs. By combining these two modalities, VLMs bridge the gap between vision and natural language, enabling them to holistically perceive and reason about the real world. This, in turn, expands their utility to a broader spectrum of tasks and applications.

In this talk, we will provide a brief overview of VLMs, covering their core principles, architectural foundations, tasks of interest and real-world applications, as well as offer a glimpse into the challenges and exciting research directions that lie ahead.