You need a topic and supervisor to pass this course
Any data mining related topic which is complex enough and has a university supervisor will do
Normally you should choose your BSc or MSc thesis topic
Young PhD student can take something, which will bring it closer to the first article.

Kickoff slides

Natural language processing

General contact person: Mark Fishel
Minimal number of avaliable topics: 5

Kairit Sirts
- Compressing fine-tuned input embeddings of NLP neural models with a projection matrix
- Neural models for Estonian text analysis
- Interpretable neural text classification models
Sven Laur
- Phrase similarity measures that are robust to word order (Hele-Andra Kuulmets)

General contact person: Jaak Vilo
Minimal number of avaliable topics: 5

Dmytro Fishman
- Cell Phenotyping using Convolutional Neural Networks
- Astrocytes Segmentation in Brain Microscopy Images
- Using CapsuleNets for Human Tissue Segmentation
Leopold Parts
- Finding features that predict why a cancer requires a gene for growth
- Analysing CRISPR/Cas9 gRNA libraries
- Analysing CRISPR/Cas9 gene knockout experiments
Kaur Alasoo
- Analysis of genetic variants regulating gene expression in cis
- The effect of chromatin accessibility on gene expression
- Meta-analysis of trans-eQTLs accross cell types and tissues
- Predicting cell-type-specific genetic effects with neural networks

General contact person: Raul Vicente
Minimal number of avaliable topics: 5

General contact person: Meelis Kull
Minimal number of avaliable topics: 3

General contact person: Sven Laur
Minimal number of avaliable topics: 3

Marek Oja
- Recovery of disease treatment cases from EHR data
Sven Laur
- Analysis of medical procedure logs: fairness and anomalies
- Disease treatment trajectories
- Extraction of stroke related facts from EHR
- Extraction of diagnostic facts from medical imaging descriptions

General contact person: Sherif Sakr
Minimal number of avaliable topics: 3

Automated Selection and Optimization of Distributed Machine Learning Algorithms
Declarative Querying of Distributed Graphs
Complex Event Processing Over Event Intervals: The Case of Apache Flink
Comparative Evaluation for the Performance of Big Stream Processing Systems
Online Detection of Electrical Vehicle Charging Activity
Auto Tuning of Flink Jobs: A Machine Learning Approach
Fast Creation of Training Data using Weak Supervision
Toward Interpretable Machine Learning Techniques
Interpretability of automatically extracted machine learning features in medical images

Ezequiel Scott
- Understanding team performance in agile software development
  Mining software repositories consist in applying techniques to mine data from software repositories in order to leverage development data. Many kinds of repositories are intensively used by developers in today’s settings such as source control repositories and issue tracking repositories (e.g. Bitbucket, Github, Jira). These repositories contain a wealth of information that is available to extract, analyze and explore to study several development phenomena. For example, some studies have explored the evolution of projects and the prediction of relevant issues. However, very little attention has been paid to the role of human factors in the data analyzed from software repositories. This is surprising since human factors are always involved in every software development process. The goal of this project is to use the data from the repositories about software developers in order to analyze their relationship with the team and their performance. We will provide a dataset of several software projects and your task will be to calculate several performance metrics in the context of agile software development. In addition, you will use simple predictive models and/or stats to describe the team performance.