Arvutiteaduse instituut
  1. Kursused
  2. 2016/17 kevad
  3. Andmekaeve (MTAT.03.183)
EN
Logi sisse

Andmekaeve 2016/17 kevad

  • Home
  • Lectures
  • Homeworks
    • Submit
  • Projects
  • Software
  • Links

HW7. Machine Learning I (09.04)

1. Four classifiers gave the following quality

	TP	TN	FP	FN
A	225	100	175	100
B	180	200	75	145
C	100	650	200	50
D	120	500	350	30

Calculate based on these tables the precision, recall, accuracy, F1-score (F-measure) for each classifier. Based on those, which are "best" classifiers?

2. Plot the information about these classifiers on the "ROC-space".

a) Discuss the "goodness" of each classifier A, B, C, D using the scores from Task 1 and ROC space.

b) Discuss how does the imbalance of different labeled examples potentially impact the goodness measures?

3. Use the information about different classifiers from this file here. Each classifier outputs a score in which the "prediction" is made. Smaller score means higher "probability" by the method for the prediction, i.e. smaller ranking. Identify the order by which each classifier would classify data. Plot all three classifiers as ROC curves. Calculate AUC value for them and compare the classifiers.

4. Use the same data as in 3. Consider the different price for Type I and Type II errors. First, use prices as 10 and 100, and then 100 and 10; similarly 40-60 and 60-40. Identify optimal "cutoff" for these three classifiers using these pricing schemes. Highlight them on ROC curves.

5. Three different classifiers were presented individually in 3. Try to make an ensemble learner out of three. Try out two different methods of the four examples:

a) take the simple sum of three scores

b) scale the scores in order to weigh more the "best" classifiers.

c) convert scores to ranks and take the sum of three ranks

d) use the sum of two best (or worst) ranks (out of three) as the new rank order.

Visualise on the same ROC curve plot with 3 individual plots. Which "ensemble" is your favourite?

6. (Bonus 1p) Can you build an ensemble using some clear rules based on the three classifier outputs that would beat all three individual ones, as well as the a-d from Task 5?

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused