HW 08 (8.11) OLAP
1. Read: Jim Gray et al. “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”. Data Mining and Knowledge Discovery 1(1), 1997.
2. Rewrite the following query using the SQL 92 standard. That is, rewrite the GROUP BY CUBE(...) clause into an equivalent query using UNION and simple GROUP BY:
SELECT product, year, region, sum(price * vol) FROM Orders GROUP BY CUBE(product, year, region);
3. Use data sample from US census 2000 http://biit.cs.ut.ee/~vilo/edu/Data/census2000/extract_medium.csv.gz . Characterise the relationship between salary and income in relation to State and Age of a person. (hint: use pivot tables of Excel/OO/LibreOffice). If you use Excel, add heatmap on top of the pivot table. To illustrate some of the found relationships, add a screenshot.
4. Characterise the relationship between salary and income in relation to State, Education, and Sex of a person. How many individuals in each category?
5. Explain the terms slice and dice, drill-down and roll-up. Provide examples, e.g. from the census data analysis.
6. (Bonus 2p) More data and explanations is available at http://biit.cs.ut.ee/~vilo/edu/Data/census2000/ (2010 census - http://2010.census.gov/2010census/data/ ) Experiment with limitations of spreadsheet/pivot table Excel. Both on the amount of data that can be loaded in, as well as the speed of calculations.