Competitive Data Science
Wednesday 16.15-18.00 in Delta-1026 (Narva mnt 18) and online BigBlueButton
Please register for the course "How to Win a Data Science Competition: Learn from Top Kagglers" using your university email either here or here and try to solve practical tasks from the "Introduction & Recap" module to estimate required efforts for completing the course.
Data science competitions provide a great opportunity to sharpen your programming skills, enhance domain knowledge, and learn more about practical applications of machine learning algorithms. This fall we will together follow "How to Win a Data Science Competition: Learn from Top Kagglers" course by Kaggle masters and grandmasters Dmitry Ulyanov, Mikhail Trofimov, Alexander Guschin, Dmitry Altukhov, and Marios Michailidis. Upon its completion, we will apply our newly acquired skills to practice and participate in the real data science competition.
When you finish this class, you will (according to Coursera description):
- Understand how to solve predictive modeling competitions efficiently and learn which of the skills obtained can be applicable to real-world tasks.
- Learn how to preprocess the data and generate new features from various sources such as text and images.
- Be taught advanced feature engineering techniques like generating mean-encodings, using aggregated statistical measures, or finding nearest neighbors as a means to improve your predictions.
- Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data.
- Gain experience in analyzing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors, and other data-related issues such as leakages and you will learn how to overcome them.
- Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance.
- Master the art of combining different machine learning models and learn how to ensemble.
- Get exposed to past (winning) solutions and codes and learn how to read them.
Additionally, we will discuss papers about performance tricks and winning solutions of the past competitions together in the class. In the second part of the course, you will participate in a real data science competition. See the Schedule section for more information about the syllabus.
Disclaimer
This course is meant to provide you an opportunity for a deep dive into the world of competitive data science. It does not guarantee immediate Kaggle medals or visible results in any other form. All the course participants are obliged to comply with UT study regulations, Coursera Honor Code, and the rules of the selected data science competition (i.e. private sharing is not allowed within the class).
Prerequisites
We recommend (although not require if you feel confident) to take at least one of the Data Science specialization courses (MTAT.03.227 Machine Learning, LTAT.01.001 Natural Language Processing, or LTAT.02.001 Neural Networks) and expect all the participants to have prior experience with Python (we will certainly use numpy, matplotlib, and pandas). This course will not be about the theory behind the algorithms but rather their practical applications and limitations.
Workload
The nominal workload for this course is 3 ECTS (78 hours, ~5-6 hours per week).
Attendance
Attendance is not compulsory. Generally, we designed the course in such a way, so it will be possible to fully participate in it from home in case of necessity (Coursera part is available online, seminars will be either recorded in class or conducted via Zoom). However, we encourage the participants to visit our offline meetings whenever they feel it possible because the discussion is an essential part of the learning process.
Organization
We will use the reverse classroom approach, where we read and watch the material at home and discuss it in the class. Discussions will be led by students themselves.
There will be a lot of self-organization in this course, be ready!
Grading
Students will need to complete the "How to Win a Data Science Competition" course, present one paper during the seminar, participate in a data science competition (either alone or in a team) and present its overview to the class.
Contacts
Mikhail Papkov (mikhail.papkov at ut.ee) - instructor
Prof. Raul Vicente (raul.vicente.zafra at ut.ee) - coordinating the course