## Assignment 10 (6,7 May) Clustering

Use points:

X Y A 3 4 B 2 7 C 1 3 D 2 4 E 3 6 F 5 3 G 4 2 H 2 2 I 5 5

1. Create a single linkage clustering (using Euclidean distance). Paper and pen or Excel ... should suffice.

2. Create a complete linkage clustering. Compare with 1.

3. Cluster the data with k-means into 3 clusters.

4. Try to find 2 different starting points (original selection of centers) that produce different clustering in k-means. If needed, vary k.

5. Imagine a very large dataset (100,000 objects, vectors of length 100). Estimate nr of operations that are needed to cluster data using complete linkage.

6. **Bonus(1p)** Propose a combined clustering providing speedup for data in 5 under the assumption that there is about 100 dense clusters (that you can probably identify by k-means) but the goal is to have an overview and relationships between those 100 clusters. (hint - combine k-means and hierarhcical clustering)

7. **Bonus(2p)** Select a project topic and identify 1-2 key questions for your project. Report those on a (less than) 1-page project plan.