Andmekaeve - Kursused - Arvutiteaduse instituut

HW 9 (due April 19th) Clustering continued...

1. Use the data set Attach:D1.txt. Visualise this data using some density representation. E.g.a 2-D density based heatmap, or 3-D density plot ("mountains").

2. Overlay the "cluster centers" on 2-D scatterplot.

This would look something like Make your version.

Now perform the K-means clustering with these above 12 points as starting points and plot similarly the final cluster centers. Plot both the initial and the end states (how each center changes). (Optional: plot the full trajectory of K-means center movements through the intermediate steps of K-means)

3. and 4. Implement in a compact simple style a SOM algorithm yourself (python, C, Java - whatever): iteratively fetch samples in random order, detect closest cluster center (the "winner"), modify that winner to move s% closer to the current sample, and the immediate neighbours of the winner by t%. Experiment with s and t, and lower this speed during the algorithm. Goal: try to visualise the "trajectory" how the SOM cluster centers "move" in the process, as in task 2.

5. Consider now density based clustering, the DBSCAN. Experiment with DBSCAN or some other density based clustering algorithm on this data. Use different parameters and try to visualise the outcomes. Discuss the findings and compare briefly the goods and bads of the DBSCAN vs K-means and SOM algorithms.

6. (Bonus 2p) Look at the "Complex Heatmaps" tools for R (Bioconductor package). Describe briefly the main functionality. Generate some "complex" data yourself and demonstrate how to use "complex" features of this package. Present the "answer" as a brief tutorial and example. Complex Heatmaps

Andmekaeve 2014/15 kevad

HW 9 (due April 19th) Clustering continued...