Arvutiteaduse instituut
  1. Kursused
  2. 2024/25 kevad
  3. Big Data Management (LTAT.02.003)
EN
Logi sisse

Big Data Management 2024/25 kevad

  • HomePage

Course Content

This course aims to introduce students to the principles and methods of advanced data management and processing. The course will cover the techniques of storing and processing different data types (structured, semi-structured and unstructured). It will cover the state-of-the-art in different kinds of big data processing systems (e.g., map-reduce, stream processing, graph data processing, scalable machine and deep learning systems).

Learning Objectives

  • Understand the Big Data Landscape, its challenges, and its technological stack
  • Develop a Big Data common-sense to identify the need for specific technologies
  • Gain practical experience by experimenting with relevant technologies (i.e., Apache Spark and its ecosystem)
  • Obtain an overview of industrial and academic R&D in the Big Data context.

Info

Lectures and Practices will be recorded and made available on Moodle.

Lectures will cover theoretical concepts and will be followed by a small test (including seminars).

  • Lectures Thursday 16.15 - 18:00

During the practice, you will see usage examples for the corresponding technology. The teacher will go step-by-step into one or more examples of applications of Apache Spark.

  • Practices Friday 14.15 - 16.00

Zoom Link for Both Below (Authentication Needed)

  • Practices Repository (Authentication Needed)

During the project office hours, the teacher will be online to support you in finding a solution to the project. The idea is to allocate time for the course so you can work in groups.

  • Project Template Repository (Authentication Needed)

Syllabus

  • Introduction to Big Data
  • Deployment Models
  • Taming Data Volume with Apache Spark
  • Taming Data Velocity with Spark Structured Streaming
  • Taming Data Variety with GraphFrames
  • Gaining Value with Spark MLlib

Grading

  • 80% on mini projects (20% each)
  • 10% final-exam
  • 10% mini-tests at the end of the lectures (presence mandatory, including seminars)

Schedule

Projects

 IntroducedComplain-byDeadline |
Project 120.02.202523.02.202509.03.2025 |
Project 213.03.202516.03.202530.03.2025 |
Project 303.04.202506.04.202513.04.2025 |
Project 417.04.202520.04.202511.05.2025 |

Lectures

Lecture DateLecturerTopicPractice DateLecturerTopic
13.02.2025RiccardoBDM intro, RDD14.02.2025KristoContainers 101
20.02.2025RiccardoSpark SQL, DF21.02.2025RiccardoSpark SQL, DF
27.02.2025RiccardoSpark DF + Delta Lake28.02.2025HasanDF, Delta Lake
06.03.2025KristoProject 107.03.2025HasanProject 1
13.03.2025RiccardoStreaming I14.03.2025HasanStructured Streaming
20.03.2025RiccardoStreaming 221.03.2025HasanStructured Streaming
27.03.2025KristoProject 228.03.2025HasanProject 2
03.04.2025RiccardoBig graphs04.04.2025KristoGraphframes
10.04.2025KristoProject 311.04.2025KristoProject 3
17.04.2025RiccardoThesis Day18.04.2025HOLIDAY 
24.04.2025Ahmed W.Spark ML25.04.2025Ahmed W.Spark ML
01.05.2025HOLIDAY 02.05.2025HOLIDAY 
08.05.2025Ahmed W.Project 409.05.2025Ahmed W.Project 4
15.05.2025TBAResearch in Bigdata16.05.2025TBAResearch in Bigdata
22.05.2025KristoResearch in Bigdata23.05.2025ExamExam
29.05.2025TBAResearch in Bigdata30.05.2025TBAGuest practice
05.06.2025Exam Retake   
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused