Academic Lectures

We have invited several high-level experts and research leaders to give lectures and share your knowledge with you. Here is a short overview of their background and abstracts.

Indre Zliobaite - Evaluation of machine-learned models
University of Helsinki Homepage, Google Scholar

Abstract: This course will focus on methodologies for evaluating machine-learned predictive models when data is not independently and identically distributed (IID). The assumption of IID is classic for state-of-the-art machine learning; however, it is often violated in practical standard settings of contemporary data. Such settings include streaming data, evolving data, concept drift, transfer learning, geo-referenced data, and more. Can we provide any guarantees for model performance under such settings, and if so, how do we quantify the uncertainty? The course will recap the classical evaluation of machine-learned models under IID before discussing scenarios where the IID assumption is violated, and the ground truth is mainly absent. Concepts will be illustrated with case studies in knowledge discovery around change processes in nature and society.

Bio: Indrė Žliobaitė is a professor in the Department of Computer Science at the University of Helsinki, where she leads an interdisciplinary research group called Data Science and Evolution. She has contributed methods, tools, and perspectives for learning from evolving data and pioneered algorithmic approaches to fairness-aware machine learning. Her research program focuses on computational approaches for analyzing complex systems and understanding change processes in nature and society.

Dhruv Shah - "Large-Scale Machine Learning for Robotics"
University of Berkeley and Google DeepMind Homepage, Google Scholar, Photo

Abstract:

Abstract: Robot learning methods typically rely either on learning from large-scale simulation modeling and transferring to real-world settings or by collecting real-world interaction data on the target robot. While this paradigm has been successful for solving simple tasks in structured environments, it may fall short for tasks that are hard to simulate accurately (e.g. in off-road racing) and where data collection may be expensive (e.g. micro UAVs with <5 minute battery life). My research proposes an alternative paradigm of “cross-embodiment” robot learning, building algorithms and systems that can leverage internet-scale data to learn intelligent behaviors in unstructured, open-world environments. In this talk, I will discuss the unique challenges and opportunities that motivate building “robot foundation models”, and present the first instantiation of such a model for the task of visual navigation. I will then discuss how such a model can serve as a pre-trained backbone for a variety of downstream applications, such as autonomous off-road racing and socially-compliant navigation, as well as bootstrap learning for entirely new robots such as drones, quadrupeds, and manipulators. Finally, I will discuss how these robot foundation models can be empowered with current vision and language foundation models using a novel planning framework to build robust robotic systems capable of “in-the-wild” deployment of intelligent robotic systems.

Bio:

Dhruv Shah is a Senior Research Scientist at Google DeepMind and an incoming Assistant Professor at Princeton University. His research spans the fields of machine learning and robotics, with the goal of building autonomous robots that can combine large-scale learning with real-world deployment. Dhruv is a Microsoft Future Leader in Robotics & AI (2024), Berkeley Fellow, and his research has been nominated for and won several Best Paper Awards at premier robotics conferences like RSS and ICRA. His work has also been featured in several media outlets, including IEEE Spectrum, TechXplore, Two Minute Papers, and ZDNet, along with several international venues. Earlier, Dhruv obtained a PhD from UC Berkeley, advised by Prof. Sergey Levine, and a B. Tech. from IIT Bombay.

Veli Mäkinen - Elements of Pangenomic Data Science: Foundations of compressed text indexing & Alignments on pangenome representations
University of Helsinki Homepage, Google Scholar

Abstract: The lectures introduce the audience to the analysis of massive genome sequencing datasets. The focus will be on compressed data structures that enable scalable analysis. To motivate the technical content, the lectures will start with an overview of standard genome analysis workflows for read alignment, variant calling, and haplotype assembly. Then the data structure prerequisites, including suffix trees, suffix arrays, Burrows-Wheeler transform, rank and select, and wavelet tree, are covered, followed by showing how these structures are used in practical read alignment tools. After covering these classical elements, the lectures continue into extending the techniques to pangenomic data: all known variants observed in a species (pangenome) are used as the basis of the analyses, instead of a single reference genome. Various pangenome representations are introduced, including sets of strings, variation graphs, founder graphs, founder sequences, and elastic strings. Tailored data structures and algorithms to enable read alignment on these representations are covered, including efficient discovery of maximal exact matches as seeds with so-called r-index, maximum flow-based path cover approximation, and sparse dynamic programming for chaining seeds. The lectures are based on a textbook: Mäkinen, Belazzougui, Cunial, Tomescu. Genome-Scale Algorithm Design: Bioinformatics in the era of high-throughput sequencing. Cambridge University Press, 2023, http://www.genome-scale.info

Bio: Veli Mäkinen is Professor of Computer Science at the University of Helsinki and Vice-Head of Department. He has supervised seven PhD theses and is currently supervising one. Veli Mäkinen started his career in string algorithms and compressed data structures. As of 2023, he has 120 peer-reviewed publications on these and related topics, with the focus being shifted towards algorithmic bioinformatics, where different high-throughput sequencing data analysis scenarios make near-linear time algorithms that work in small space an appealing target of study. Some current research interests include studying algorithms and computational complexity when moving from sequences to variation graphs in pangenomics, and studying applications of different index structures related to the variants of the Burrows-Wheeler transform. Veli Mäkinen serves in several programme committees (e.g., ESA 2022, CPM 2023, WABI 2023, RECOMB 2024, ISMB 2024) and has given invited talks e.g. at SPIRE 2019 and Genome Informatics 2023.

Tanel Alumäe - Foundation models in speech recognition
Tallinn University of Technology Homepage, Google Scholar

Abstract:

Speech recognition technology converts spoken words into text. Recently, it has advanced rapidly thanks to large-scale training data, multilingual models, and self-supervised learning. This lecture will explore the main neural model architectures used in speech recognition. We will discuss how these models can be pretrained and finetuned for specific languages. Additionally, we will examine how text-based GPT-style large language models can be adapted for speech recognition. Finally, we will cover how to create a speech recognition system for any language using only textual training data.

Bio:

Pawel Sobocinski - Mathematical concepts for safe AI: probabilistic reasoning with string diagrams
Tallinn University of Technology ,

Abstract:

Bio:

21st Estonian Summer School on Computer and Systems Science

Academic Lectures