Exam organisation
Exam is in a written form. You will get 16 short questions. Each of them asks you to
- define a certain concept or notion
- explain why this notion is it useful
- describe how is it implemented in GNU R.
To get 1 point for the question you have to give adequate answer for 2 sub-questions. Answers should be short: up to five sentences long.
Notions and Concepts
Decision trees
- Entropy
- Supervised learning
- Unsupervised Learning
- Association rule
- Decision tree
- ID3 algorithm
Linear models
- Univariate linear regression
- Multivariate linear regression
- Polynomial regression
- Residuals and mean square error
- Diagnostic methods for linear regression
Performance evaluation measures
- Experiment design: training, validation and test data
- Expected loss and empirical loss
- Test error on holdout data
- Cross-validation
- Bootstrap sample and its usage
Introduction to optimisation
- Local and global minimum
- Difference between discrete and continuous optimisation
- Difference between unconstrained and constrained optimisation
- Gradient and partial derivatives
- Gradient descent algorithm
Linear classification
- Binary classification problem
- Decision border
- Linear discriminant function (general form of a linear classifer)
- Perceptron algorithm
- Fisher discriminant
Feed-forward neural networks for prediction tasks
- Linear neuron and linear regression
- Sigmoid neuron and logistic regression
- Feed-forward neural network
- Hidden layer and output neuron
- Backpropagation algorithm and its connection to gradient descent
Basics of probabilistic modelling
- Approximation algorithms and 95% confidence intervals
- Binomial distribution and its connection to test error
- Univariate gaussian distribution
- Bayes formula
- Naive Bayes classifier
Maximum likelihood and maximum a posteriori estimates
- Model likelihood
- Maximum likelihood principle
- Maximum a posteriori principle
- Ridge regression
- L1-regularisation and its effect
Model-based clustering techniques
- Hierarchical clustering
- Difference between soft and hard clustering algorithms
- Clustering as minimisation problem
- K-means algorithm and its underlying probabilistic model
- Gaussian mixture models
Expectation-maximisation
- What quantities define mixture model
- How one estimates distribution parameters
- Why to assign different weights to observations
- EM algorithm as two-step optimisation technique
Principal component analysis
- Principal component analysis
- Linear projection with maximal variance
- Component loadings and their interpretation
- Diagram for explained variance
- Eigenvalues and eigenvectors
Statistical learning theory
- Bias of a model class
- Bias variance trade-off in model selection
- Asymptotic consistency
- Optimism as the difference between test and training error
- What guarantees do SLT bounds give to us
Support Vector Machines
- Geometric margin and stability of classification results
- Maximal margin classifier
- Minimisation task for finding maximal margin
- Soft margin classifier
- Support vectors
Kernel methods
- Non-linear feature map
- Kernel trick
- Polynomial and exponential kernels
- Necessary and sufficient properties for kernels
- Dual representation and dual coordinates
Basics of ensemble methods
- Ensemble learning ant its benefits
- Bagging and its connection to bias-variance decomposition
- Random forest algorithm
- Cascading classifiers and adaboost algorithm
- Robustness of bagging and boosting algorithms