Institute of Computer Science
  1. Courses
  2. 2012/13 fall
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2012/13 fall

Edit page
Past edits Uploaded files

DM - 2012

  • Main
  • Lectures
  • Projects
  • Links
  • Homework
    • Homework upload
    • admin
  • Feedback
Edit sidebar

HW 04 (11.10) Clustering

1. Perform a "Single Link" clustering of 2-D data from slide 28. Use Euclidean distance as a distance measure. Draw a dendrogram/tree with node height at the distance at where the clusters were merged. Hint: Draw the points first on 2D and then perform manual simulation. (Solutions on paper are ok :-)

	X	Y
A	2	4
B	7	3
C	3	5
D	5	3
E	7	4
F	6	8
G	6	5
H	8	4
I	2	5
J	3	7

2. Problem with the association analysis was that often they produced far too many association rules. Propose a distance measure to compare association rules. Envision an hierarchical clustering procedure. How would you present such clustering result to end-users? (e.g. make one "Powerpoint slide" with such a sketchup) Discuss good and bad sides of your solution.

3. Compare the UPGMA and WPGMA hierarchical clustering methods. In which situations would you recommend to use one over the other?

4. Listen to the presentation by Tamara Munzner: Keynote on Visualization Principles - http://vizbi.org/Videos/26205288 (use the PDF slide-deck from there as well http://bit.ly/nCJM5U ). Which aspects were most interesting or striking to you?

5. Revisit 2 after lessons from Tamara's presentation. Improve your visualisation on "Powerpoint". Try to make a better version using some ideas from Tamara's presentation to "add spice".

6. Bonus (2p) Read about document similarity measures: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf Summarise the chapter contents and main structure on 1 page.

7. Bonus (2p) Use MeV or R or any other tool offering hierarchical clustering and cluster hierarchically some data of interest to you. If nothing else, then use the generated 2x2 contingency table from last week. In that case add also the column/columns for some "interestingness" scores to those data.

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment