Arvutiteaduse instituut
  1. Kursused
  2. 2021/22 sügis
  3. Andmetehnika (LTAT.02.007)
EN
Logi sisse

Andmetehnika 2021/22 sügis

  • HomePage
  • Grading

Course Objective

The course aims at giving an overview of Data Engineering foundational concepts. It is tailored for 1st and 2nd year Msc students and PhDs who would like to strengthen their fundamental understanding of Data Engineering, i.e., Data Modelling, Collection, and Wrangling.

IMPORTANT

  • Lecturers will be in class unless announced otherwise, but they will be recorded.
  • Practices will be fully online and recorded.
  • Lectures are in Room 1037
  • It is not always lecture/practice sequences every week. That is, we might have two lectures in the same week, in some other week we can have two practice sessions in a row. Have a look on the tentative syllabus below. Any changes will be announced at least one session ahead and on course Moodle page.
  • Slots are
 * Mon. 16.15 - 18.00 weeks 2-16 
 * Thu. 10.15 - 12.00 weeks 2-16 
  • Link for practice session (log into courses to see link)
  • Material for classes and practices will be listed on Moodle.

Prerequisites

Familiarity with the following concepts is strongly recommended to succeed in the course:

  • Algorithm and Data Structures
    • Graphs, Trees, Tables, Lists
  • Programming Languages
    • Java and Python
  • (Relational) Databases and Query Languages
    • SQL, JsonPath, and openCypher.
    • Joins, Aggregations, Table definition, and manipulation (Create, Update, Insert, Alter)

Syllabus (Tentative)

Note: The syllabus might be subject to change and will be adjusted during August/September

  • Introduction Lecture
    • What is (Big) Data?
    • The Role of Data Engineer
    • From Data Warehouse to Data Lakes
  • Introduction Practice.
    • Docker
    • Jupyter Notebooks

Part 1: Data Lifecycle

This part will be covered in ~two weeks.

  • Lectures
    • Data lifecycle, ETL/ELT, Data processing pipelines (Airflow)
    • Data Ingestion
    • Data Pre-processing
    • Data cleansing
  • Practice
    • Data cleansing

Part 2: Data Modelling and Query Languages

This part will be covered in ~six weeks.

  • Lectures
    • Relational Data
    • Data Warehousing
      • Star and Snowflake schemas
    • NoSQL
      • Key-Value Stores
      • Document
      • Graph
  • Practice
    • Modelling and Querying Relational data: MySQL
    • Modelling and Querying Document data: MongoDB
    • Modelling and Querying Graph data: Cypher

Part 3: Data Streams Pipelines

This part will be covered in ~three weeks

  • Lecture
    • Data streams - an introduction
    • Time-series, event modeling
    • Event sourcing, stream-table duality
    • Streaming pipelines
  • Practice
    • Data Ingestion with Apache Kafka
    • Data Processing with Kafka Streams/KSQL

Contacts

  • Lecturers:
    • Riccardo Tommasini - riccardo.tommasini@ut.ee
    • Ahmed Awad - ahmed.awad@ut.ee
    • Feras Awaysheh - feras.awaysheh@ut.ee
  • Teaching Assistants
    • Mohamed Ragab
    • Fabiano Spiga
    • Kristo Raun

Recommended Books

  • Database System Concepts
  • Designing Data-Intensive Applications - Martin Kleppmann
  • Designing Event-Driven Systems
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Tartu Ülikooli arvutiteaduse instituudi kursuste läbiviimist toetavad järgmised programmid:
euroopa sotsiaalfondi logo