Institute of Computer Science
  1. Courses
  2. 2016/17 spring
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2016/17 spring

  • Home
  • Lectures
  • Homeworks
    • Submit
  • Projects
  • Software
  • Links

HW8. Machine Learning II (16.04)

Use the following data in next three tasks and predict the "play" column.

Outlook	Temp	Humidity	Windy	play
Sunny	Hot	High	FALSE	No
Sunny	Hot	High	TRUE	No
Overcast	Hot	High	FALSE	Yes
Rainy	Mild	High	FALSE	Yes
Rainy	Cool	Normal	FALSE	Yes
Rainy	Cool	Normal	TRUE	No
Overcast	Cool	Normal	TRUE	Yes
Sunny	Mild	High	FALSE	No
Sunny	Cool	Normal	FALSE	Yes
Rainy	Mild	Normal	FALSE	Yes
Sunny	Mild	Normal	TRUE	Yes
Overcast	Mild	High	TRUE	Yes
Overcast	Hot	Normal	FALSE	Yes
Rainy	Mild	High	TRUE	No

And classify this example by manual simulation of three following algortihms:

Sunny	Cool	High	TRUE	???

1. Build a decision tree simulating ID3 with Information Gain. Classify the example.

2. Build a Naïve Bayes classifier. Classify the example.

3. Classify the example using k-nearest neighbour method. Use 1-nn, 3-nn, and 5-nn method. For distance measure you can for example count how many variables differ (Manhattan distance); possibly making ordinal values like hot-cool differ by "distance" (e.g. hot-cool=2, hot-mild=1).

4. Use the data set "Bank Marketing". Apply decision tree and Naïve Bayes classifier on this data. Perform 5-fold cross validation (withhold 20% of data each time) and report the goodness of the built classifiers.

5. Apply the Random Forest method on the same data set. Report the quality similar as in 4. Draw an ROC curve for RF classifier. What is the basis of scoring and ordering the predictions in order to achieve the ROC curve?

6. (Bonus 1p) Extract the most relevant features from the above Bank Marketing data based on inspection of random forest classifier, decision tree, and Naïve Bayes. Justify and interpret these features.

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment