Exam organisation

Exam is in a written form. You will get 16 short questions. Each of them asks you to

  • define a certain concept or notion
  • explain why this notion is it useful
  • describe how is it implemented in GNU R.

To get 1 point for the question you have to give adequate answer for 2 sub-questions. Answers should be short: up to five sentences long.

Notions and Concepts

Decision trees

  • Entropy
  • Supervised learning
  • Unsupervised Learning
  • Association rule
  • Decision tree
  • ID3 algorithm

Linear models

  • Univariate linear regression
  • Multivariate linear regression
  • Polynomial regression
  • Residuals and mean square error
  • Diagnostic methods for linear regression

Performance evaluation measures

  • Experiment design: training, validation and test data
  • Expected loss and empirical loss
  • Test error on holdout data
  • Cross-validation
  • Bootstrap sample and its usage

Introduction to optimisation

  • Local and global minimum
  • Difference between discrete and continuous optimisation
  • Difference between unconstrained and constrained optimisation
  • Gradient and partial derivatives
  • Gradient descent algorithm

Linear classification

  • Binary classification problem
  • Decision border
  • Linear discriminant function (general form of a linear classifer)
  • Perceptron algorithm
  • Fisher discriminant

Feed-forward neural networks for prediction tasks

  • Linear neuron and linear regression
  • Sigmoid neuron and logistic regression
  • Feed-forward neural network
  • Hidden layer and output neuron
  • Backpropagation algorithm and its connection to gradient descent

Basics of probabilistic modelling

  • Approximation algorithms and 95% confidence intervals
  • Binomial distribution and its connection to test error
  • Univariate gaussian distribution
  • Bayes formula
  • Naive Bayes classifier

Maximum likelihood and maximum a posteriori estimates

  • Model likelihood
  • Maximum likelihood principle
  • Maximum a posteriori principle
  • Ridge regression
  • L1-regularisation and its effect

Model-based clustering techniques

  • Hierarchical clustering
  • Difference between soft and hard clustering algorithms
  • Clustering as minimisation problem
  • K-means algorithm and its underlying probabilistic model
  • Gaussian mixture models


  • What quantities define mixture model
  • How one estimates distribution parameters
  • Why to assign different weights to observations
  • EM algorithm as two-step optimisation technique

Principal component analysis

  • Principal component analysis
  • Linear projection with maximal variance
  • Component loadings and their interpretation
  • Diagram for explained variance
  • Eigenvalues and eigenvectors

Statistical learning theory

  • Bias of a model class
  • Bias variance trade-off in model selection
  • Asymptotic consistency
  • Optimism as the difference between test and training error
  • What guarantees do SLT bounds give to us

Support Vector Machines

  • Geometric margin and stability of classification results
  • Maximal margin classifier
  • Minimisation task for finding maximal margin
  • Soft margin classifier
  • Support vectors

Kernel methods

  • Non-linear feature map
  • Kernel trick
  • Polynomial and exponential kernels
  • Necessary and sufficient properties for kernels
  • Dual representation and dual coordinates

Basics of ensemble methods

  • Ensemble learning ant its benefits
  • Bagging and its connection to bias-variance decomposition
  • Random forest algorithm
  • Cascading classifiers and adaboost algorithm
  • Robustness of bagging and boosting algorithms