Institute of Computer Science
  1. Courses
  2. 2016/17 spring
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2016/17 spring

  • Home
  • Lectures
  • Homeworks
    • Submit
  • Projects
  • Software
  • Links

HW7. Machine Learning I (09.04)

1. Four classifiers gave the following quality

	TP	TN	FP	FN
A	225	100	175	100
B	180	200	75	145
C	100	650	200	50
D	120	500	350	30

Calculate based on these tables the precision, recall, accuracy, F1-score (F-measure) for each classifier. Based on those, which are "best" classifiers?

2. Plot the information about these classifiers on the "ROC-space".

a) Discuss the "goodness" of each classifier A, B, C, D using the scores from Task 1 and ROC space.

b) Discuss how does the imbalance of different labeled examples potentially impact the goodness measures?

3. Use the information about different classifiers from this file here. Each classifier outputs a score in which the "prediction" is made. Smaller score means higher "probability" by the method for the prediction, i.e. smaller ranking. Identify the order by which each classifier would classify data. Plot all three classifiers as ROC curves. Calculate AUC value for them and compare the classifiers.

4. Use the same data as in 3. Consider the different price for Type I and Type II errors. First, use prices as 10 and 100, and then 100 and 10; similarly 40-60 and 60-40. Identify optimal "cutoff" for these three classifiers using these pricing schemes. Highlight them on ROC curves.

5. Three different classifiers were presented individually in 3. Try to make an ensemble learner out of three. Try out two different methods of the four examples:

a) take the simple sum of three scores

b) scale the scores in order to weigh more the "best" classifiers.

c) convert scores to ranks and take the sum of three ranks

d) use the sum of two best (or worst) ranks (out of three) as the new rank order.

Visualise on the same ROC curve plot with 3 individual plots. Which "ensemble" is your favourite?

6. (Bonus 1p) Can you build an ensemble using some clear rules based on the three classifier outputs that would beat all three individual ones, as well as the a-d from Task 5?

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment