HW 11 (due May 3rd) Classification - scores
This time we will look at abstract classifiers and how to measure their performance.
In this Excel file there are 200 "test" cases for which we know the true label {pos,neg} (column G, Actual). And we have 5 predictions (columns labeled A,B,C,D,E respectively) of pos/neg values. In addition, there are some numeric scores explained below in Task 3.
Please also check the background materials, like for example
- http://fouryears.eu/2011/10/12/roc-area-under-the-curve-explained/ and
- http://en.wikipedia.org/wiki/Receiver_operating_characteristic
Note: use whatever tools are most appropriate for yourself. It can be Excel, but it can as well be scripting in any of your favorite programming languages.
1. Create the 5 confusion matrices for measuring the "quality" of these five hypothetical classifiers. From the tables calculate the following scores: Sensitivity, Specificity, Precision, Recall, Accuracy, F1-Score, and diagnostic odd's ratio (See Wikipedia for ROC). Briefly characterise each classifier and also, which ones you would consider the "best" and by which criterion.
2. Using the data in the file, draw the following ROC curves
a) Individual ROC curves (each in separate image) for classifiers A, B, C, D, E. b) For Classifier A and B together (on the same image) c) For all 5 classifiers together
3. For all the ROC curves of Task 2 - calculate the ROC-AUC scores. Which classifiers are best for these ROC-AUC?
4. Now use the numerical weights AScore1 and AScore2 to reorder the classifier A results. The higher the value the "more likely" the value is predicted "pos". Two instances of classifier, both thresholded at 0.5, can give different prediction "scores". Draw on the same plot these respective two ROC curves for classifier A, sorted on SCores1 and AScores 2. Calculate both AUC scores. Describe your observations.
5. Combine the first five classifiers (simple binary classifiers A-E) together by "some" weighting (e.g. provide more trust on "better" classifiers; or treat the equally, just performing majority vote or requiring 2 or 3 agree together..). Assess the quality of that combined classifier - the confusion matrix and scores, the ROC curve and ROC-AUC score.
6. Bonus (2p): Continue in combining the classifiers and making "Ensembles" - but now use also the numeric quality scores from AScore1, AScore2, DScore (for classifier D), and a mystical new ClassifierF. Try to use all this information and make an even better (if possible) classifier based on these 5+4 total classifiers. Create the respective weighting scheme, calculate the confusion matrix and scores, draw ROC curve and calculate ROC-AUC score.
7. Bonus (2p). From all the 5+4 classifiers in specific ROC "order", and the two best ensembles from task 5 and task 6 - calculate the "cutoff" at which the respective classifier achieves the 5% ,10%, and 25% False Discovery Rate (FDR). FDR is explained in Wikipedia for ROC and FDR sections. Describe briefly your "process"