Arvutiteaduse instituut
  1. Kursused
  2. 2024/25 kevad
  3. Andmetehnika mitteinformaatikutele (LTAT.02.026)
EN
Logi sisse

Andmetehnika mitteinformaatikutele 2024/25 kevad

  • Main
  • Lectures
  • Project
  • Homework
  • References

Group Project: Implementing an ETL Process and Data Visualization

Objective: In groups of 2-4 students, implement an ETL process to extract data from one or more sources, transform the data, and load it into an SQL or flat database. Create a compelling data visualization using Tableau, Apache Superset, R ggplot (RMarkdown), or Python plotnine.

Description:

Form groups of 2-4 students and choose one or more data sources from Homework 1.

Extract and transform the source data:

  • Pull in the selected data source(s) and preprocess the data as needed.
  • Perform data transformation and cleaning tasks to ensure data consistency and accuracy.

Load the transformed data:

  • Load the cleaned and transformed data into an SQL or flat database.
  • Ensure proper indexing and organization of the data for efficient querying and analysis.

Create a data visualization:

  • Using Apache Superset, Tableau, R ggplot (RMarkdown), or Python plotnine, create a compelling visualization that highlights key insights and tells a story with the data.
  • Ensure that the visualization is clear, concise, and visually appealing.

Focus on data management and data engineering aspects:

  • Ensure that your project follows the best data management and engineering practices.
  • Use a shared git repository for version control and collaboration among group members.
  • Maintain an up-to-date README that documents the project progress and provides a clear overview of the project structure.
  • Provide detailed documentation on how to use the pipeline from start to finish, including any prerequisites, installation steps, and instructions for running the ETL process and generating the visualization.

Final submission:

  • Submit the link to your shared git repository containing the ETL pipeline, visualization code, README, and documentation.
  • Include a brief report (around 500-700 words) describing the project's objectives, data sources, ETL process, visualization, and any challenges encountered and how they were addressed.

This group project aims to provide hands-on experience in implementing an ETL process and creating a data visualization while emphasizing the importance of data management and data engineering principles. Collaborate effectively within your group and ensure clear communication to complete the project successfully. Good luck!

Project due 08.05.2025

Sellele ülesandele ei saa hetkel lahendusi esitada.
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused