Institute of Computer Science
  1. Courses
  2. 2012/13 fall
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2012/13 fall

Edit page
Past edits Uploaded files

DM - 2012

  • Main
  • Lectures
  • Projects
  • Links
  • Homework
    • Homework upload
    • admin
  • Feedback
Edit sidebar

HW 09 (15.11) R, Multidimensional Modeling

1. Revisiting QQ-plot (R is recommended to use). Generate 1000 values from uniform distribution with minimum -2 and maximum 2. Find 0.01, 0.03, ..., 0.97, 0.99 quantile of these values. Draw kernel density estimation curve of these 1000 values and mark all quantiles with a vertical line (50 lines in total) to the same plot where you have the curve. On a separate plot, draw real density of standard normal distribution and same quantiles as vertical lines, but now calculated based on theoretical distribution. Place these two plots so that one is on top and other is on bottom and x-axis has the same scale. Describe how vertical lines are related with QQ-plot.

2. Design an OLAP multidimensional model for the purpose of web log analysis. (Think of standard Apache style web log). Which dimensions there are? Are there any slowly or fast changing dimensions? Do you prefer Star or Snowflake Schema?

3. What would be the interesting non-trivial questions that you could answer using your design in 2.

4. Describe an application area and some dimension(s) where the dimension tables would be (much) larger than the fact tables. When does this situation arise?

5. Would you be able to do anything about such situation with a better design of the same or similar multi-dimensional model?

6. (Bonus, 3p) Import the US census data as shared last week (http://biit.cs.ut.ee/~vilo/edu/Data/census2000/ large? full?) into some actual database and OLAP system and apply OLAP in practice. Is there much speed gain as compared to Excel Pivot tables? (MDX, MS SQL, Postgres, ... )

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment