Institute of Computer Science
  1. Courses
  2. 2015/16 spring
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2015/16 spring

  • Course Home
  • Lectures
  • Projects
  • Homeworks
    • Submit
  • Software
  • Links

...

HW12 (16.05) - Clustering, PCA, OLAP,

There is an image with shuffled rows - http://biit.cs.ut.ee/imgshuffle/index.cgi?fname=DM2016&dname=DM2016 You can

  1. get access to RGB values in here - http://biit.cs.ut.ee/imgshuffle/data/DM2016/DM2016.txt (uploaded file here)
  2. You can re-order row id's in any order and upload them to the same webpage to recover image in that new order of rows.

New image for those who know what is in the first image and want to have fun finding out what's in the picture (txt file).

1. Apply any clustering techniques (hierarchical, SOM, K-Means) that you wish and try to recover what is pictured on the image.

2. Use the same data matrix from task 1 and run a PCA analysis on it. Plot first three principal components as 2-dimensional plots PC1-PC2, PC1-PC3, PC2-PC3 of these data or as a 3D plot. Check out PCA example

3. Grab US census data (e.g. medium size) in here - http://biit.cs.ut.ee/~vilo/edu/Data/census2000/ Make Pivot table summary about people's earnings based on various variables. E.g. the gender and education level. Make sure to apply heatmaps on top of pivot table.

4. On the same data - try to visualize other relationships in data - based on ancestry, industry, marital status and education, for example.

5. Read the Jim Gray - Data Cube abstraction. Describe the key operators from this article using examples based on above census data (tasks 3-4). (Alternative list of operations - https://en.wikipedia.org/wiki/OLAP_cube#Operations )

6. (Bonus 2p) Attempt running a TSP or other techniques to recover as well as possible the original image of tasks 1.-2.

7. (Bonus 2p) Load the same census data sets (you can attempt larger ones, too) into a DB and run SQL queries to achieve summarization as in pivot tables.

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment