Machine Learning and other math subjects
Basics of Supervised Machine Learning
Two basic supervised machine learning tasks are classification and prediction. In this block, machine learning is formulated as a minimisation task. That is, any learning algorithm must find a configuration that minimises certain objective function. The following figure captures the most important mathematical concepts used in the first block and a list of university courses that define or study these concept. Subjects with red background are essential, orange subjects are good to know in order to fully appreciate the beauty of mathematical proofs, light green marks subjects that are needed if you want to derive new results, dark green marks truly advanced treatments or applications.
Model based reasoning
The second block mostly discuss how to choose a good objective function so that the optimisation procedure would yield a reasonable output. It turns out that it is good to treat everything as randomised processes in order to find a good objective function to minimise. Moreover, we can naturally embed our background knowledge into the machine learning method.
Instance based methods
The third block revisits classification and prediction tasks form a different perspective. Namely, there is a inevitable trade-off between plasticity and stability in machine learning tasks. If we are willing to learn complex models, then we need many data points to adequately fix the model parameters. As a result, training error could be much smaller that the future performance of the algorithm on unknown data. This issue is further studied by Statistical Learning Theory, which has lead to development of Support Vector Machines and other kernel method. In laymen's terms, these methods do not try to fit parameters on the data. Instead, they keep few very informative data points and use them to make future predictions.
Ensemble methods and prediction averaging
The last block discusses another important issue. Namely, finding a single classifier or predictor often leads to sub-optimal results. A particular model might be correct only in limited domain or only for a certain sub-population in data. Hence, better results can be obtained by training a collection of predictors and then outputting a consolidated prediction.