- Create dev container with VSCode: Utilize the 'Add Dev Container Configuration' command to set up a development container in Visual Studio Code, providing a consistent and reproducible development environment.
- Open ETL project or create a new one: Launch your existing ETL project within the dev container or start a new project to work with air quality data.
- Install Python packages: Install necessary Python packages, such as duckdb and pyarrow, to handle DuckDB and Parquet file operations.
- Write air quality data as Parquet file: Convert the air quality data into a Parquet file format for efficient storage and faster query performance.
- Query Parquet file using DuckDB in Python: Employ DuckDB to execute SQL queries on the Parquet file within a Python script, enabling seamless data processing and analysis.
- Install R packages and query Parquet file in R: Install the required R packages, such as dockdb and DBI, to interact with the Parquet file. Perform queries and analysis in R to showcase the flexibility of working with Parquet files across different programming languages.