Arvutiteaduse instituut
  1. Kursused
  2. 2014/15 kevad
  3. Andmekaeve (MTAT.03.183)
EN
Logi sisse

Andmekaeve 2014/15 kevad

  • Home
  • Lectures
    • Videos
  • Homeworks
    • Homework upload
  • Projects
  • Links
  • LaTeX

HW 12 (due May 17th) Machine learning...

Use this example data about products, their features and sales - Attach:ML_Example.xlsx

Your task is to fill in the "gaps" - by simple "manual" argumentation:

1. Build a decision tree using ID3 (step by step, be ready to explain every detail) and classify the rows p21-p24

2. Classify the same example using the Bayes and Naive Bayes methods. Discuss the methods, and exactly how you can manually apply these methods.

3. Use the probabilistic arguments and fill in the gaps for p25-p27. (By popular demand from TA-s: "Probabilistic arguments" could be almost anything- k-nn, Bayes, various conditional probabilities.. )

4. Suppose you are given a task of classifying texts (e.g. sorting e-mail as spam or not). Is it a good idea to apply the K-nearest neighbors algorithm? How could you apply it? What if your training set is very large, how would you solve the algorithm performance problems? Be reasonably brief: no more than two-three short paragraphs total.

5. Read the article by Domingos: A few useful things to know about machine learning (communications of the ACM, Vol. 55 No. 10, Pages 78-87 doi:10.1145/2347736.2347755 via ACM Digital library, ( https://courses.cs.ut.ee/MTAT.03.183/2012_fall/uploads/Main/domingos.pdf ). Make a list of key messages with a supporting 1-2 sentence example or clarification of that message (a short summary of the article).

6. Bonus (2p) Install Weka - http://www.cs.waikato.ac.nz/ml/weka/ Use the two data sets - the "Titanic" data, the "Iris" data and the "letter.arff" data. Apply the decision trees (J48 is the same as C4.5) and 2 other machine learning classification tasks. Describe the process and findings.

7. Bonus (2p) Use Weka on some example (subset of) data from your own chosen project - describe the process and results briefly. If the group generates together the neccesary dataset, then every group member should apply different machine learning methods (trees, rules, SVM, random forests, Bayes, NaiveBayes, etc).

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused