Andmekaeve - Kursused - Arvutiteaduse instituut

HW05 (13.03) - Q-Q-plots and frequent itemsets

When plotting some plot, please provide your interpretation – what is your conclusion based on that plot.

1. Use the data of child height/weight and study them using qq-plots in a specific age and gender at a time.

Compare heights of underweight and very overweight children
Compare one of the attributes (height, weight, BMI) between boys and girls (select either younger or older age group) to each other.

2. Fix the age (look only at one specific age – of exactly the same nr of months). Compare the distribution of height vs BMI using two qq-plots – one for boys, one for girls. (Hopefully different students will pick different age for this)

3. Compare the height of overweight children against theoretical normal distribution. You can limit to certain age group and gender.

4. Use the example from lecture slides:

B C A F H
F E C H
E D B
A C H F 
E F A
D H B
E C F B D 
A H C E 
G A E
B H E

Follow the apriori algorithm principle and enumerate all itemsets that have support of 0.3 or higher, provide support. (probably best to solve using pen and paper or simple text editor and Unix command line tools)

5. Calculate the support and confidence for every possible association rule from the above example where there is exactly one item on the left and one item on the right (e.g. A->E). Make two 8x8 tables (A..H) x (A..H), one for support and the other for confidence. Be clever, create some simple script for calculating this. Color these as heatmap (e.g. in Excel)

Which rules are “most interesting” from 5 based on those data?

Andmekaeve 2015/16 kevad

HW05 (13.03) - Q-Q-plots and frequent itemsets