# Exam organisation

Exam is in a written form. You will get 16 short questions. Each of them asks you to

- define a certain concept or notion
- explain why this notion is it useful
- describe how is it implemented in GNU R.

To get 1 point for the question you have to give adequate answer for 2 sub-questions. Answers should be short: up to five sentences long.

# Notions and Concepts

## Decision trees

- Entropy
- Supervised learning
- Unsupervised Learning
- Association rule
- Decision tree
- ID3 algorithm

## Linear models

- Univariate linear regression
- Multivariate linear regression
- Polynomial regression
- Residuals and mean square error
- Diagnostic methods for linear regression

## Performance evaluation measures

- Experiment design: training, validation and test data
- Expected loss and empirical loss
- Test error on holdout data
- Cross-validation
- Bootstrap sample and its usage

## Introduction to optimisation

- Local and global minimum
- Difference between discrete and continuous optimisation
- Difference between unconstrained and constrained optimisation
- Gradient and partial derivatives
- Gradient descent algorithm

## Linear classification

- Binary classification problem
- Decision border
- Linear discriminant function (general form of a linear classifer)
- Perceptron algorithm
- Fisher discriminant

Feed-forward neural networks for prediction tasks

- Linear neuron and linear regression
- Sigmoid neuron and logistic regression
- Feed-forward neural network
- Hidden layer and output neuron
- Backpropagation algorithm and its connection to gradient descent

Basics of probabilistic modelling

- Approximation algorithms and 95% confidence intervals
- Binomial distribution and its connection to test error
- Univariate gaussian distribution
- Bayes formula
- Naive Bayes classifier

## Maximum likelihood and maximum a posteriori estimates

- Model likelihood
- Maximum likelihood principle
- Maximum a posteriori principle
- Ridge regression
- L1-regularisation and its effect

## Model-based clustering techniques

- Hierarchical clustering
- Difference between soft and hard clustering algorithms
- Clustering as minimisation problem
- K-means algorithm and its underlying probabilistic model
- Gaussian mixture models

## Expectation-maximisation

- What quantities define mixture model
- How one estimates distribution parameters
- Why to assign different weights to observations
- EM algorithm as two-step optimisation technique

## Principal component analysis

- Principal component analysis
- Linear projection with maximal variance
- Component loadings and their interpretation
- Diagram for explained variance
- Eigenvalues and eigenvectors

## Statistical learning theory

- Bias of a model class
- Bias variance trade-off in model selection
- Asymptotic consistency
- Optimism as the difference between test and training error
- What guarantees do SLT bounds give to us

## Support Vector Machines

- Geometric margin and stability of classification results
- Maximal margin classifier
- Minimisation task for finding maximal margin
- Soft margin classifier
- Support vectors

## Kernel methods

- Non-linear feature map
- Kernel trick
- Polynomial and exponential kernels
- Necessary and sufficient properties for kernels
- Dual representation and dual coordinates

## Basics of ensemble methods

- Ensemble learning ant its benefits
- Bagging and its connection to bias-variance decomposition
- Random forest algorithm
- Cascading classifiers and adaboost algorithm
- Robustness of bagging and boosting algorithms