HW08 (due Nov 20th) Machine Learning IV
Exercise 1 (EX1) (1 point)
Make an attempt in solving the spiral problem (bottom right dataset among the 4 provided datasets) in tensorflow playground with the smallest possible number of neurons and layers. Please use the default training to test ratio, noise and batch size parameters.
Please consider the task solved if the test loss as shown at the top right of the website is below 0.1 and if there are no points of some class on the area of another colour, meaning that all training points are correctly classified.
(a) Describe your search process in finding the architecture (combination of neurons and layers) which solves the problem. Did you try different architectures randomly or systematically? Which changes did you try to make to the architecture?
(b) Report the total number of different architectures that you tried until reaching the desired result. Make a screenshot that would show that your network has solved the problem. In the report spell out separately what epoch your network has reached and train and test loss it has achieved.
Read the supporting text below the demo and answer the following questions.
(c) What do the colours orange and blue stand for in: (c1) connections between neurons; (c2) inside neurons; (c3) inside the output image (the one with a spiral).
(d) Which input features have you chosen for your architecture? Why are they important for your network to solve the problem?
(e) If you have changed any other hyper-parameters such as the learning rate, activation function, regularization or regularization rate, then please explain the reasons and implications of this change.
Exercise 2 (EX2) (1 point)
Read the article by Domingos: A few useful things to know about machine learning (communications of the ACM, Vol. 55 No. 10, Pages 78-87 doi: 10.1145/2347736.2347755 via ACM Digital library, here as PDF).
Make a numbered list of 10 key messages with a supporting 1-2 sentence example or clarification of that message. This should result in something like a short summary of the article. Please choose the messages so that:
- from each page of the article you have at least 1 message (please indicate the page number with the messages);
- at least 3 of your chosen messages should be such that they have not been covered in our lectures, please indicate which ones.
Exercise 3 (EX3) (2 points)
Participate in the Kaggle inClass competition that we have prepared for our course. Kaggle is the most well-known platform for machine learning competitions. Data scientists team up with colleagues to solve various data science related challenges. In our competition there is only one person you can rely on - you, yourself. You are allowed to make maximally 10 submissions per day. Each submission is a CSV-file containing your predicted labels on the test data. You are provided with an example submission file showing how to format the CSV-file, please follow this format or otherwise your submission will not be graded. We suggest you to use caret with its long list of available models to choose from. Also, you should have all the knowledge and skills from the previous homeworks. Please complete the following objectives:
- Register for the competition and create a one-person team (please name it with your first name and surname)
- To get 1 point you should make 3 submissions obtained with different learning algorithms. Please give meaningful names to the submissions (like "linearSVM" or "KNN with K=1") and include a screenshot of your submissions into the report, where it would be clear which submissions are yours. Remember that Kaggle expects class probabilities for each row in the test file, and not binary classes.
- To get another point you should make a submission that reaches or gets higher than AUC=0.92 score (include a screenshot again for this one). In order to claim this point, describe your best current solution.
Bonus exercise 1 (BEX1) (1 point)
As usually done in Kaggle competitions, we have split the test data into a public and a private fold, and you do not know which test instance is in which. Until the deadline you will see only the results on the public test fold. Before the deadline you should select up to 2 submissions for judging. The final leaderboard will be made based on the predictions on the private test fold. Finish in the top 10 of the private leaderboard in Kaggle inClass to earn 1 bonus point.
Bonus exercise 2 (BEX2) (2 points)
Finish in the top 3 of the private leaderboard in Kaggle inClass. (does not sum up with the previous bonus).