Transformers / machine translation
Transformers are a short name for neural networks with self-attention, multiple heads and several layers, and for the last couple of years they have been the source of state-of-the-art solutions in machine translation, text classification and several other tasks. Think GPT-1/2/3, BERT, also translate.ut.ee / Google Translate, ner.tartunlp.ai, neurokone.ee -- all actually different applications of transformers. Very recently they have shown great results with image and speech data, so their future is looking bright.
At the lectures we will discuss the core transformer architecture with more focus on language models (BERT, XLM, GPT-n, etc.) and neural machine translation, with at least a couple of lectures about more recent results of applying transformers to non-textual data.
The practical part assumes that you are comfortable with Python and will teach you to train models in Colab, run things on HPC GPUs, etc.
There will be 4 short homeworks as well as a project, where you are also free to focus on whichever kind of data you want (text, speech, image, time series, etc.).
Instructors:
Mark Fishel (lectures)
Lisa Korotkova (labs)