HW 05 (18.10)
1. Use the data from last week task 1, and simulate K-means algorithm. Use initial centers of (2,6), (2,8), (5,8). Explain step by step. Then use the same data and simulate K-medoids, starting from cluster center points D, E, and H. Data is this:
X Y A 2 4 B 7 3 C 3 5 D 5 3 E 7 4 F 6 8 G 6 5 H 8 4 I 2 5 J 3 7
2. Install and run mldemos and try out the clustering with k-means. Identify situations when k-means clearly does not cluster as expected as compared the “true” clustering expected by you. Make screenshots and discuss why it happens.
3. When you have identified why such unpleasant situations arise - can you propose some remedy to it? Propose some heuristics how to overcome such issues.
4. I have downloaded a small dataset from http://www.imf.org/external/pubs/ft/weo/2012/02/weodata/index.aspx - the Attach:DM_2010_IMF.xls that has 5 attributes per country. Cluster it and describe the issues you encounter while doing so.
5. During the lecture we described the SOM clustering method and principle. Implement the SOM algorithm in the modification that has only 1-dimensional "grid". E.g. that has 30 or 100 or n grid elements. Take the new datapoint and assign it to the most similar grid point, then update that point and a nearby range of other points. Outline the exact algorithm in pseudocode.
Comment from TA: See e.g. http://en.wikipedia.org/wiki/Pseudocode if you don't know what is pseudocode.
6. (Bonus 2p) - Implement your 1-D SOM algorithm yourself and apply it. (for example, to the above country statistics data set).