Andmetehnika mitteinformaatikutele - Kursused

Homework

Antud kursusel pole ühtegi ülesannet.

HW3 (due 29.05)

Apache Spark DataFrames and SQL with Yelp Dataset

Dive into Big Data analysis using Apache Spark DataFrames and SQL on the Yelp dataset. Process and manipulate data in parallel, learning to load Yelp tables as DataFrames, extract user statistics, scrutinize businesses, and generate pivot tables with the Spark DataFrame API and Spark SQL. Ensure a functional Spark environment, and submit Python scripts and outputs as deliverables. BigDataLab

HW2 (due 10.04)

ETL Process for Air Quality Data

Perform an ETL (Extract, Transform, Load) process on air quality data from http://airviro.klab.ee/ and create tables with hourly, daily, and monthly average values for all columns in the dataset. Adhere to data management principles, maintain an organized file structure, and document the process in a README.md file. Publish the code on GitHub (private repositories are allowed). Further instructions

HW1 (due 27.03)

Data Source Exploration for Group Project

Identify a suitable data source for a group project and briefly describe its key attributes, such as data type, purpose, update frequency, ownership, and other relevant aspects. Further instructions

Andmetehnika mitteinformaatikutele 2022/23 kevad