Course Content
This course aims to introduce students to the principles and methods of advanced data management and processing. The course will cover the techniques of storing and processing different data types (structured, semi-structured and unstructured). It will cover the state-of-the-art in different kinds of big data processing systems (e.g., map-reduce, stream processing, graph data processing, scalable machine and deep learning systems).
Learning Objectives
- Understand the Big Data Landscape, its challenges, and its technological stack
- Develop a Big Data common-sense to identify the need for specific technologies
- Gain practical experience by experimenting with relevant technologies (i.e., Apache Spark and its ecosystem)
- Obtain an overview of industrial and academic R&D in the Big Data context.
Info
Lectures and Practices will be recorded and made available on Moodle.
Lectures will cover theoretical concepts and will be followed by a small test (including seminars).
- Lectures Thursday 16.15 - 18:00
During the practice, you will see usage examples for the corresponding technology. The teacher will go step-by-step into one or more examples of applications of Apache Spark.
- Practices Friday 14.15 - 16.00
Zoom Link for Both Below (Authentication Needed)
- Practices Repository (Authentication Needed)
During the project office hours, the teacher will be online to support you in finding a solution to the project. The idea is to allocate time for the course so you can work in groups.
- Project Template Repository (Authentication Needed)
Syllabus
- Introduction to Big Data
- Deployment Models
- Taming Data Volume with Apache Spark
- Taming Data Velocity with Spark Structured Streaming
- Taming Data Variety with GraphFrames
- Gaining Value with Spark MLlib
Grading
- 80% on mini projects (20% each)
- 10% final-exam
- 10% mini-tests at the end of the lectures (presence mandatory, including seminars)
Schedule
Projects
Introduced | Complain-by | Deadline | | |
---|---|---|---|
Project 1 | 20.02.2025 | 23.02.2025 | 09.03.2025 | |
Project 2 | 13.03.2025 | 16.03.2025 | 30.03.2025 | |
Project 3 | 03.04.2025 | 06.04.2025 | 13.04.2025 | |
Project 4 | 17.04.2025 | 20.04.2025 | 11.05.2025 | |
Lectures
Lecture Date | Lecturer | Topic | Practice Date | Lecturer | Topic |
---|---|---|---|---|---|
13.02.2025 | Riccardo | BDM intro, RDD | 14.02.2025 | Kristo | Containers 101 |
20.02.2025 | Riccardo | Spark SQL, DF | 21.02.2025 | Riccardo | Spark SQL, DF |
27.02.2025 | Riccardo | Spark DF + Delta Lake | 28.02.2025 | Hasan | DF, Delta Lake |
06.03.2025 | Kristo | Project 1 | 07.03.2025 | Hasan | Project 1 |
13.03.2025 | Riccardo | Streaming I | 14.03.2025 | Hasan | Structured Streaming |
20.03.2025 | Riccardo | Streaming 2 | 21.03.2025 | Hasan | Structured Streaming |
27.03.2025 | Kristo | Project 2 | 28.03.2025 | Hasan | Project 2 |
03.04.2025 | Riccardo | Big graphs | 04.04.2025 | Kristo | Graphframes |
10.04.2025 | Kristo | Project 3 | 11.04.2025 | Kristo | Project 3 |
17.04.2025 | Riccardo | Thesis Day | 18.04.2025 | HOLIDAY | |
24.04.2025 | Ahmed W. | Spark ML | 25.04.2025 | Ahmed W. | Spark ML |
01.05.2025 | HOLIDAY | 02.05.2025 | HOLIDAY | ||
08.05.2025 | Ahmed W. | Project 4 | 09.05.2025 | Ahmed W. | Project 4 |
15.05.2025 | TBA | Research in Bigdata | 16.05.2025 | TBA | Research in Bigdata |
22.05.2025 | Kristo | Research in Bigdata | 23.05.2025 | Exam | Exam |
29.05.2025 | TBA | Research in Bigdata | 30.05.2025 | TBA | Guest practice |
05.06.2025 | Exam Retake |