Arvutiteaduse instituut
Logi sisse
  • English
  • Kursused
  • 2020/21 sügis
  • Andmetehnika (LTAT.02.007)

Andmetehnika 2020/21 sügis

  • HomePage
  • Material
  • Grading

Course Objective

The course aims at giving an overview of Data Engineering foundational concepts. It is tailored for 1st and 2nd year Msc students and PhDs who would like to strengthen their fundamental understanding of Data Engineering, i.e., Data Modelling, Collection, and Wrangling.

IMPORTANT

  • Lecturers will be in class unless announced otherwise, but they will be recorded (NOT Streamed).
  • Practices will be fully online and recorded.
 * Mon. 12.15 - 14.00 weeks 2-16 Invite Link  (log into courses to see link)  
 * Tue. 10.15 - 12.00 weeks 2-16 Invite link   (log into courses to see link) 
  • Material for classes and practices will be listed on GitHub https://github.com/DataSystemsGroupUT/dataeng

Prerequisites

Familiarity with the following concepts is strongly recommended to succeed in the course:

  • Algorithm and Data Structures
    • Graphs, Trees, Tables, Lists
  • Programming Languages
    • Java and Python
  • (Relational) Databases and Query Languages
    • SQL, JsonPath, and openCypher.
    • Joins, Aggregations, Table definition, and manipulation (Create, Update, Insert, Alter)

Syllabus (Tentative)

Note: The syllabus might be subject to change and will be adjusted during August/September

  • Introduction Lecture
    • What is (Big) Data?
    • The Role of Data Engineer
    • From Data Warehouse to Data Lakes
  • Introduction Practice.
    • Docker
    • Jupyter Notebooks

Part 1: Data Modelling and Query Languages

  • Lecture
    • Relational Data
    • NoSQL
      • Key-Value Stores
      • Document
      • Graph
  • Data Warehousing
    • Star and Snowflake schemas
  • Practice
    • Modelling and Querying Relational data: MySQL
    • Modelling and Querying Key-Value data: Redis
    • Modelling and Querying Document data: MongoDB
    • Modelling and Querying Graph data: Cypher
  • Extras
    • Modelling and Querying RDF data: SPARQL
    • Domain-Driven Design: a summary
    • Event Sourcing: a summary

Part 2: (Big) Data Pipelines

  • Lecture
    • Big Data Systems Architectures
    • ETL and Data Pipelines
    • Best Practices and Anti-Patterns
    • Batch vs Streaming Processing
    • Data Replication
    • Data Partitioning
    • Transactions
  • Practice
    • Data Ingestion with Apache Kafka
    • Data Pipelines with Apache Airflow
    • Data Processing with Kafka Streams/KSQL
  • Extras
    • Data Pipelines with Luigi
    • Data Pipelines with Apache Nifi
    • Data Processing with Apache Flink

Part 3: Data Wrangling

  • Lecture
    • Data Cleansing
    • Augmentation
  • Practice
    • Cleansing examples using Python
    • Augmentation examples using Pandas and Tensorflow

Video Lectures

  • https://panopto.ut.ee/Panopto/Pages/Viewer.aspx?pid=95df00d4-62ff-49c7-ab66-ac2b008490b5

Slides

  • https://github.com/DataSystemsGroupUT/dataeng

Contacts

  • Lecturer:
    • Riccardo Tommasini - riccardo.tommasini@ut.ee
  • Teaching Assistants
    • Mohamed Ragab
    • Hassan Eldeeb
    • Fabiano Spiga

Recommended Books

  • Database System Concepts
  • Designing Data-Intensive Applications - Martin Kleppmann
  • Designing Event-Driven Systems
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:
Tartu Ülikooli arvutiteaduse instituudi kursuste läbiviimist toetavad järgmised programmid:
iktp regionaalarengu fondi logo euroopa sotsiaalfondi logo tiigriülikooli logo it akadeemia logo