Loomuliku keele töötlus - Kursused - Arvutiteaduse instituut

LTAT.01.001 Natural language processing

This course aims to provide a broad overview of the field of natural language processing. Although large language models (such as ChatGPT) have started to dominate the field in the last few years and lots can be achieved by prompting LLMs, this course aims to give a more deeper view into the basic concepts that are relevant for understanding also the recent advancements. The course first covers the basic text processing methods and various ways of representing textual data. We will look at the main task formulations used for natural language processing tasks: text classification, sequence tagging and text generation.

There are many tasks in natural language processing. Some are lower level text processing tasks such as word tokenization or sentence segmentation. Then are the natural language understanding tasks, such as sentiment analysis or extractive question answering. Finally, the tasks in the higher end of complexity assume abstract reasoning and response generation, such as machine translation, text summarization or engaging in dialogues.

One of the most important technical aspects in all these tasks is the representation - how to represent input data to convey its meaning. Thus, a large part of the course deals with various ways of representing textual inputs, using both symbolic and count-based sparse representations as well as complex representations computed by powerful neural network architectures.

Course info

Lectures: Wednesdays at 16:15
- Delta building (Narva mnt 18), room 1019 (at least initially)
Practicums: Mondays at 16:15
- Delta building (Narva mnt 18), room 2045
Seminars: Later in the semester both on Mondays and Wednesdays at 16:15.
Lecturer: Kairit Sirts (kairit.sirts@ut.ee)
TA: Emil Kalbaliyev (emil.kalbaliyev@ut.ee)
TA: Aleksei Dorkin (aleksei.dorkin@ut.ee)

It will also be possible to join lectures via Zoom. We will also record the lectures, which will be available via Moodle.

Assessment

Type	Points	Comment
Practicum exercises submission	10 points	10 practicums, 1 point each
Practical homeworks	35 points	3 practical individual homeworks
Seminar presentation	5 points	Based on a given materials, done in groups
Project	35 points	Done individually or in a group
Theory test	20 points	In the end of the semester
Total	105 points

How to pass the course?

In order to pass the course, you have to obtain at least 51 points from any course activities (homeworks, project, practicum exercises, seminar presentation, theory test).

None of the course activities are compulsory. However, most course activities can only be done on the scheduled times and cannot be compensated later. Thus, we advise you to consider carefully in case you decide to skip any of the activities.

Plagiarism

As expected, plagiarism is not allowed. Homeworks, practicum exercises and theory test are strictly individual work. Individual assignments can be discussed in groups but your solution must be your own.

Using generative AI (ChatGPT)

Usage of generative AI is allowed in accordance to the university guidelines. Please consult the guidelines carefully about what is appropriate use vs what constitutes plagiarism. In the end, you as a student are solely responsible for the content of your work.

Prerequisites

This course assumes knowledge from various areas. In Study Information System, the required prerequisite course is Machine Learning (MTAT.03.227) and the recommended prerequisite courses are Language Technology (MTAT.06.045) and Artificial Intelligence I (MTAT.06.008). In practice, we also assume the basic knowledge of higher math (calculus, linear algebra, probabilities) and computer programming (python). If you lack some of the required knowledge then it is your responsibility to acquire it at the level necessary for advancing on this course. We can help to find suitable materials for obtaining the necessary background.

Loomuliku keele töötlus 2024/25 kevad