Institute of Computer Science
  1. Courses
  2. 2012/13 fall
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2012/13 fall

Edit page
Past edits Uploaded files

DM - 2012

  • Main
  • Lectures
  • Projects
  • Links
  • Homework
    • Homework upload
    • admin
  • Feedback
Edit sidebar

HW 03 - 27.09.2012

Please remind (or learn) a bit of probability and conditional probability. Example materials:

  • Probability and Statistics intro
  • Conditional Probability (Wikipedia)
  • Conditional probability

Please find enclosed a data set with 1000 cases of 5 throws of dice (values 1..6). File: 02_dice.txt

Your task is to study the dice T1..T5. Are they "loaded" ? Are there any "dependencies" in between them?

1. Calculate the probability distributions of different values for each die. (count the frequency of each outcome)

2. Calculate conditional probabilities of last two dice given outcomes of T1, T2, T3. I.e. the frequencies of each outcome given outcomes of other dice (T1,T2).

  • P(T4|T1), P(T5|T1), P(T1|T4), P(T1|T5)
  • P( T4 | T1,T2 ), P( T5 | T1,T2 )

3. Simulate FP Growth given a data set of following transactions (order items by frequency). Calculate (manually) all frequent itemsets with support at least 2.

 1: {a, d, e} 
 2: {b, c, d} 
 3: {a, c, e} 
 4: {a, c, d, e} 
 5: {a, e} 
 6: {a, c, d} 
 7: {b, c} 
 8: {a, c, d, e} 
 9: {b, c, e} 
 10: {a, d, e}

4. Generate 1000 "random" 2x2 contingency tables for 1000 elements (distributed into f11, f10, f01, f00). Try to make randomness so that the cells are not too evenly distributed but are also likely to contain some more extreme values. Calculate the Piatetsky-Shapiro, Correlation and J-measure values. Identify best 2x2 tables according to your data.

5. Plot the above three measures values against each other (3 comparisons) and try to characterise verbally how and why the measures are different from each other.

6. (Bonus 1p) Eliminate from the above 1000 tables those with support less than 1%, 5%, 10%, 20% , 50% - how the comparisons of measures as done in task 5 changes?

7. (Bonus 1p) Listen to the presentation by Peter Donnely. http://www.ted.com/talks/peter_donnelly_shows_how_stats_fool_juries.html "Extract" from there in a formal way the examples of statistical argumentation (coin tosses, HIV, and cot death).

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment