The grading will be 70% project, 30% exam. During the course, it may be possible to earn bonus points. E.g., by participating in in-class quizzes.
Project
The project should show an end-to-end data engineering lifecycle product.
The project will be done in a group of 3 people. Timeline:
Date & time | Activity |
---|---|
2024-10-20 23:59 | Project proposal and group formation |
2024-12-10 23:59 | Poster submission |
2024-12-14 23:59 | Project submission |
2024-12-16 16:15 | Poster session (in person) |
All times are in current Estonian time.
Project proposal:
- 1 A4 pdf
- which datasets will you use
- at least 3 questions you want to find an answer to
- (optional) system/tool architecture diagram
Poster submission:
- 1 A0 pdf
Project submission:
- Git repository
- Executable and documented code
- Readme/walkthrough
Grading: Minimum grade to pass is 50% (35 points). To get the minimum grade, your project needs to:
- Join 2 source datasets;
- It needs to be raw data that you clean and transform.
- The datasets need to be separate. 2 tables/csv files from a single source do not count as 2 datasets.
- Use Airflow for orchestration;
- Model the data using the star schema;
- Use DuckDB as data storage.
Additional points.
- +5 points for any other tool or framework we cover during the course, e.g.:
- dbt, MongoDB, Apache Iceberg, Neo4j, data governance, data privacy, Streamlit;
- +10 points for any tool outside the course;
- Needs to be a state-of-the-art tool and approved by instructors;
- + creativity points;
- Ask instructors: can we get extra points if we do this... It has to make sense and be valuable.
Tools or creativity points that need instructor approval must be checked with instructors at least one week before project submission, i.e., 2024-12-07 23:59.
Exam
The exam takes place physically on 2nd December, 16:15 (lecture time), in room 1019 in Delta.
The exam is with pen & paper.
Duration is 1h 15 minutes.
Use of laptops/phones/books/talking is not allowed. You can use one single (double-sided) A4 paper, only with handwritten notes. If the notes on A4 are not handwritten, the notes will be removed. If the student uses any device or help during the exam that is not allowed, it will be considered academic fraud.
The questions in the exam are based on the course book, lectures, and practice sessions.
Example questions will be posted on Moodle about 1 week before exam time.
At least 50% (15 points) are needed to successfully pass the exam.