Assignment 10 (6,7 May) Clustering
Use points:
X Y A 3 4 B 2 7 C 1 3 D 2 4 E 3 6 F 5 3 G 4 2 H 2 2 I 5 5
1. Create a single linkage clustering (using Euclidean distance). Paper and pen or Excel ... should suffice.
2. Create a complete linkage clustering. Compare with 1.
3. Cluster the data with k-means into 3 clusters.
4. Try to find 2 different starting points (original selection of centers) that produce different clustering in k-means. If needed, vary k.
5. Imagine a very large dataset (100,000 objects, vectors of length 100). Estimate nr of operations that are needed to cluster data using complete linkage.
6. Bonus(1p) Propose a combined clustering providing speedup for data in 5 under the assumption that there is about 100 dense clusters (that you can probably identify by k-means) but the goal is to have an overview and relationships between those 100 clusters. (hint - combine k-means and hierarhcical clustering)
7. Bonus(2p) Select a project topic and identify 1-2 key questions for your project. Report those on a (less than) 1-page project plan.