Data Mining - Courses - Institute of Computer Science

HW 12 (due May 17th) Machine learning...

Use this example data about products, their features and sales - Attach:ML_Example.xlsx

Your task is to fill in the "gaps" - by simple "manual" argumentation:

1. Build a decision tree using ID3 (step by step, be ready to explain every detail) and classify the rows p21-p24

2. Classify the same example using the Bayes and Naive Bayes methods. Discuss the methods, and exactly how you can manually apply these methods.

3. Use the probabilistic arguments and fill in the gaps for p25-p27. (By popular demand from TA-s: "Probabilistic arguments" could be almost anything- k-nn, Bayes, various conditional probabilities.. )

4. Suppose you are given a task of classifying texts (e.g. sorting e-mail as spam or not). Is it a good idea to apply the K-nearest neighbors algorithm? How could you apply it? What if your training set is very large, how would you solve the algorithm performance problems? Be reasonably brief: no more than two-three short paragraphs total.

5. Read the article by Domingos: A few useful things to know about machine learning (communications of the ACM, Vol. 55 No. 10, Pages 78-87 doi:10.1145/2347736.2347755 via ACM Digital library, ( https://courses.cs.ut.ee/MTAT.03.183/2012_fall/uploads/Main/domingos.pdf ). Make a list of key messages with a supporting 1-2 sentence example or clarification of that message (a short summary of the article).

6. Bonus (2p) Install Weka - http://www.cs.waikato.ac.nz/ml/weka/ Use the two data sets - the "Titanic" data, the "Iris" data and the "letter.arff" data. Apply the decision trees (J48 is the same as C4.5) and 2 other machine learning classification tasks. Describe the process and findings.

7. Bonus (2p) Use Weka on some example (subset of) data from your own chosen project - describe the process and results briefly. If the group generates together the neccesary dataset, then every group member should apply different machine learning methods (trees, rules, SVM, random forests, Bayes, NaiveBayes, etc).

Data Mining 2014/15 spring

HW 12 (due May 17th) Machine learning...