Assignment 10 (6,7 May) Clustering

Use points:

	X	Y
 A	3	4
 B	2	7
 C	1	3
 D	2	4
 E	3	6
 F	5	3
 G	4	2
 H	2	2
 I	5	5

1. Create a single linkage clustering (using Euclidean distance). Paper and pen or Excel ... should suffice.

2. Create a complete linkage clustering. Compare with 1.

3. Cluster the data with k-means into 3 clusters.

4. Try to find 2 different starting points (original selection of centers) that produce different clustering in k-means. If needed, vary k.

5. Imagine a very large dataset (100,000 objects, vectors of length 100). Estimate nr of operations that are needed to cluster data using complete linkage.

6. Bonus(1p) Propose a combined clustering providing speedup for data in 5 under the assumption that there is about 100 dense clusters (that you can probably identify by k-means) but the goal is to have an overview and relationships between those 100 clusters. (hint - combine k-means and hierarhcical clustering)

7. Bonus(2p) Select a project topic and identify 1-2 key questions for your project. Report those on a (less than) 1-page project plan.

edit