HW7. Machine Learning I (09.04)
1. Four classifiers gave the following quality
TP TN FP FN A 225 100 175 100 B 180 200 75 145 C 100 650 200 50 D 120 500 350 30
Calculate based on these tables the precision, recall, accuracy, F1-score (F-measure) for each classifier. Based on those, which are "best" classifiers?
2. Plot the information about these classifiers on the "ROC-space".
a) Discuss the "goodness" of each classifier A, B, C, D using the scores from Task 1 and ROC space.
b) Discuss how does the imbalance of different labeled examples potentially impact the goodness measures?
3. Use the information about different classifiers from this file here. Each classifier outputs a score in which the "prediction" is made. Smaller score means higher "probability" by the method for the prediction, i.e. smaller ranking. Identify the order by which each classifier would classify data. Plot all three classifiers as ROC curves. Calculate AUC value for them and compare the classifiers.
4. Use the same data as in 3. Consider the different price for Type I and Type II errors. First, use prices as 10 and 100, and then 100 and 10; similarly 40-60 and 60-40. Identify optimal "cutoff" for these three classifiers using these pricing schemes. Highlight them on ROC curves.
5. Three different classifiers were presented individually in 3. Try to make an ensemble learner out of three. Try out two different methods of the four examples:
a) take the simple sum of three scores
b) scale the scores in order to weigh more the "best" classifiers.
c) convert scores to ranks and take the sum of three ranks
d) use the sum of two best (or worst) ranks (out of three) as the new rank order.
Visualise on the same ROC curve plot with 3 individual plots. Which "ensemble" is your favourite?
6. (Bonus 1p) Can you build an ensemble using some clear rules based on the three classifier outputs that would beat all three individual ones, as well as the a-d from Task 5?