Arvutiteaduse instituut
  1. Kursused
  2. 2016/17 kevad
  3. Andmekaeve (MTAT.03.183)
EN
Logi sisse

Andmekaeve 2016/17 kevad

  • Home
  • Lectures
  • Homeworks
    • Submit
  • Projects
  • Software
  • Links

HW8. Machine Learning II (16.04)

Use the following data in next three tasks and predict the "play" column.

Outlook	Temp	Humidity	Windy	play
Sunny	Hot	High	FALSE	No
Sunny	Hot	High	TRUE	No
Overcast	Hot	High	FALSE	Yes
Rainy	Mild	High	FALSE	Yes
Rainy	Cool	Normal	FALSE	Yes
Rainy	Cool	Normal	TRUE	No
Overcast	Cool	Normal	TRUE	Yes
Sunny	Mild	High	FALSE	No
Sunny	Cool	Normal	FALSE	Yes
Rainy	Mild	Normal	FALSE	Yes
Sunny	Mild	Normal	TRUE	Yes
Overcast	Mild	High	TRUE	Yes
Overcast	Hot	Normal	FALSE	Yes
Rainy	Mild	High	TRUE	No

And classify this example by manual simulation of three following algortihms:

Sunny	Cool	High	TRUE	???

1. Build a decision tree simulating ID3 with Information Gain. Classify the example.

2. Build a Naïve Bayes classifier. Classify the example.

3. Classify the example using k-nearest neighbour method. Use 1-nn, 3-nn, and 5-nn method. For distance measure you can for example count how many variables differ (Manhattan distance); possibly making ordinal values like hot-cool differ by "distance" (e.g. hot-cool=2, hot-mild=1).

4. Use the data set "Bank Marketing". Apply decision tree and Naïve Bayes classifier on this data. Perform 5-fold cross validation (withhold 20% of data each time) and report the goodness of the built classifiers.

5. Apply the Random Forest method on the same data set. Report the quality similar as in 4. Draw an ROC curve for RF classifier. What is the basis of scoring and ordering the predictions in order to achieve the ROC curve?

6. (Bonus 1p) Extract the most relevant features from the above Bank Marketing data based on inspection of random forest classifier, decision tree, and Naïve Bayes. Justify and interpret these features.

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused