Transformers
Transformers are a short name for neural networks with self-attention, multiple heads and several layers, and for the last couple of years they have been the source of state-of-the-art solutions in several AI solutions. Think ChatGPT, translate.ut.ee / Google Translate, ner.tartunlp.ai, neurokone.ee -- all actually different applications of transformers. Very recently they have shown great results with image and speech data, so their future is looking bright.
At the lectures we will discuss the core transformer architecture, applications and associated considerations. The practical part assumes that you are comfortable with Python and ML, and will teach you to train models in Colab, run things on HPC GPUs, etc.
There will be 4 homeworks as well as a project, where you are also free to focus on whichever kind of data you want (text, speech, image, time series, etc.).
Instructors:
Mark Fishel (lectures)
Taido Purason, Martin Vainikko, Dmytro Pashchenko (labs)