HW 6 (due March 29th) Descriptive analysis and visualisation
Exploratory data mining - look at the data from different viewpoints. If data is too big, make first a random sample so that your code would work efficiently. Only then you may apply it to larger data.
1. Study the data set National Child Measurement Programme - England, 2013-14. Data from here: http://www.hscic.gov.uk/catalogue/PUB16070/nat-chil-meas-prog-eng-2013-2014-guid.zip
Focus on child height, weight, BMI and age. Calculate average height, weight and BMI for every age in the data. Plot the age (on x-axis), and height, weight, or BMI, overlaying the average value (a line) for that age.
2. Normalise the height, weight and BMI by the age (subtract from every value the average for that age group). Make the same plots again. Identify who are classified as overweight or underweight. (Optional: Calculate age group averages separately for boys and girls - how different are they?)
3. Try to plot the same data in different views. E.g. ( height*weight vs height/weight ) , and ( log(height*weight) vs log(height/weight) ). Or height*weight vs BMI. Can you interpret those graphs?
4. How would the graphs from 3 change if you apply them to the age-normalised height and weight. (Instead of "average height 0", normalise the height to 150cm and respective average weight.)
5. Watch the video presentation by Tamara Munzner: Keynote on Visualization Principles, and slides - http://www.cs.ubc.ca/~tmm/talks/vizbi11/vizbi11.pdf Summarize the key take-home messages from her presentation.
6. (Bonus, 2p) Test some of the ideas from Tamara's presentation and attempt visualising the iris data set from last week "better".