Arvutiteaduse instituut
  1. Kursused
  2. 2013/14 kevad
  3. Masinõpe (MTAT.03.227)
EN
Logi sisse

Masinõpe 2013/14 kevad

Previous years: 2008 » 2012 » 2013

  • Main
  • Lectures
  • Exercise sessions
  • Grading

III. Performance evaluation measures

Given by Sven Laur

Brief summary: Principles of experiment design. Machine learning as minimisation of future costs. Overview of standard loss functions. Stochastic estimation of future costs by random sampling (Monte-Carlo integration). Theoretical limitations. Standard validation methods: holdout, randomised holdout, cross-validation, leave-one-out, bootstrapping. Advantages and drawbacks of standard validation methods

Slides: PDF

Videos:

  • Lecture on UTTV(2013)
  • Screencast on cross-validation

Literature:

  • Davison and Hinkley: Bootstrap Methods and Their Application
  • Molinaro, Simon and Pfeiffer: Prediction Error Estimation: A Comparison of Resampling Methods
  • Arlot and Celisse: A survey of cross-validation procedures for model selection
  • Efron: Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
  • Efron and Tibshirani: Improvements on Cross-Validation: The .632+ Bootstrap Method
  • Wolfgang Härardle: Applied Nonparametric Regression: Choosing the smoothing parameter (Chapter 5)
  • Yang: Can the Strengths of AIC and BIC Be Shared?
  • van Erven, Grunwald and de Rooij:Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma

Complementary exercises:

  • Generate data form a simple linear or polynomial regression model and use various validation methods and report results:
    • Did a training method chose a correct model
    • Is there some differences when the correct model is not feasible?
    • Estimate bias and variance of a training method
    • Did a validation method correctly estimated expected losses
  • Try various classification and linear regression methods together with various validation methods report the results
    • Iris dataset
    • Computer Hardware Data Set
    • Housing Data Set
    • Datasets for testing linear regression models

Free implementations:

  • Boot package in R
  • Some methods in the rminer package in R
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Tartu Ülikooli arvutiteaduse instituudi kursuste läbiviimist toetavad järgmised programmid:
euroopa sotsiaalfondi logo