Institute of Computer Science
  1. Courses
  2. 2022/23 spring
  3. Data engineering for Conversion Master's (LTAT.02.026)
ET
Log in

Data engineering for Conversion Master's 2022/23 spring

  • Main
  • Lectures
  • Project
  • Homework
  • References

HW2: ETL Process for Air Quality Data

In this homework assignment, you will build upon the ETL process that we started in the classroom using air quality data from http://airviro.klab.ee/. The goal is to extract, transform, and load the data into a structured format that allows for easy analysis.

You can use the existing repository at https://github.com/adlerpriit/ETL as inspiration and reference, which contains steps to download and clean the hour-based data. Your task is to further process this data and create tables with daily and monthly average values for all columns in the dataset.

While completing this assignment, pay close attention to the following aspects:

  • Data management principles: Ensure data integrity, accuracy, and consistency throughout the ETL process.
  • Variable and column names: Use clear, descriptive, and consistent names for variables and columns in your code and output tables.
  • File organization and hierarchy: Maintain a well-organized folder structure for your code, data, and documentation files.
  • Documentation: Write a comprehensive README.md file that explains the steps and processes involved in the ETL process, including instructions on how to replicate the process.
  • GitHub repository: Store and publish all your code in a GitHub repository. You may use a private repository if necessary, but make sure the instructor has access.

HW1: Data Source Exploration for Group Project

In this homework assignment, you will find a publicly available and reliable data source for a group project. After selecting a suitable data source, provide a concise overview of its key attributes, including:

Dataset name and a brief description

  • Type of data (e.g., tabular, time series, geospatial) and data types (e.g., numerical, categorical, text)
  • Purpose of the data and its potential use in a group project
  • Update frequency and historical data availability
  • Data ownership, licensing, and attribution requirements
  • Data size, scalability, and quality considerations
  • Accessibility (e.g., direct download, API) and any API usage information
  • Privacy, ethical concerns, and necessary steps to address them
  • Preprocessing and cleaning tasks required before analysis
  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment