HW 09 (15.11) R, Multidimensional Modeling
1. Revisiting QQ-plot (R is recommended to use). Generate 1000 values from uniform distribution with minimum -2 and maximum 2. Find 0.01, 0.03, ..., 0.97, 0.99 quantile of these values. Draw kernel density estimation curve of these 1000 values and mark all quantiles with a vertical line (50 lines in total) to the same plot where you have the curve. On a separate plot, draw real density of standard normal distribution and same quantiles as vertical lines, but now calculated based on theoretical distribution. Place these two plots so that one is on top and other is on bottom and x-axis has the same scale. Describe how vertical lines are related with QQ-plot.
2. Design an OLAP multidimensional model for the purpose of web log analysis. (Think of standard Apache style web log). Which dimensions there are? Are there any slowly or fast changing dimensions? Do you prefer Star or Snowflake Schema?
3. What would be the interesting non-trivial questions that you could answer using your design in 2.
4. Describe an application area and some dimension(s) where the dimension tables would be (much) larger than the fact tables. When does this situation arise?
5. Would you be able to do anything about such situation with a better design of the same or similar multi-dimensional model?
6. (Bonus, 3p) Import the US census data as shared last week (http://biit.cs.ut.ee/~vilo/edu/Data/census2000/ large? full?) into some actual database and OLAP system and apply OLAP in practice. Is there much speed gain as compared to Excel Pivot tables? (MDX, MS SQL, Postgres, ... )