Andmekaeve - Kursused - Arvutiteaduse instituut

HW11 (25.04) - ML, Clustering, projects...

1. Load the dataset from here (or csv version). The data points belong to two classes - positives and negatives. Notice that two classes are not linearly separated in the original 2D feature space. However, by applying the "kernel trick" we can map the original feature space into a high dimensional one where the classes would be linearly separable. Try to come up with new feature(s) based on X and Y, such that the given points would be separable. (Hint)

2. Load dataset from this Excel file (or csv version). Your task is to simulate hierarchical clustering:

Single link (min distance) clustering
Complete link (max distance) clustering

Use common sense, no need to calculate ALL distances. Draw by hand to save time...

3. Use the same data, and use first 4 points as K cluster centers for K-means. Simulate the K-means (using Euclidean distance). Again, use common sense and approximate distances where needed. When in serious doubt, you can rely on more precise calculations.

4. Form a team of one to four people. Select a project topic. Identify a data set and define the scope of the project. Add your project description as a single slide in this file - https://docs.google.com/presentation/d/1ARpew6odg24QB4cnJ3wmr6qeAkNcajfVLxJn6mYu5Lk/edit

5. Apply descriptive statistics techniques to describe your selected data set.

If in 4 and 5 you do not have a team yet, use the opportunity to attract team members by making your data interesting in the practice session.

Andmekaeve 2015/16 kevad

HW11 (25.04) - ML, Clustering, projects...