HW 7 (due April 5th) Visualisation, regression, clustering ...
1. Take a look at three following charts:
- http://www.billboard.com/biz/articles/news/digital-and-mobile/5827354/the-download-hits-middle-age-and-it-shows
- http://junkcharts.typepad.com/.a/6a00d8341e992c53ef019b021a098f970b-pi (source)
- http://www.technologyreview.com/graphiti/520491/mobile-makeover/
Choose one of them and answer following questions:
- What is the key message for the chart?
- Is it easy to grasp?
- Is it possible to improve it according to Tamara’s suggestions? How?
- (Optional) Redraw it (either on paper or using some visualization tool) according to your suggestions.
2. Some people claim that vodka affect driving performance. An experiment is thus conducted to verify effect of vodka on emergency reaction capabilities during driving. The experimental protocol is the following one: ten drivers are each tested twice, once after having two glasses of vodka and once after having two glasses of water. The two tests were on two different days to give the vodka a chance to rip off from body. Half of the drivers were given vodka first and half were given water first. The scores of the 10 subjects are shown in the next table. The first number for each driver is their performance in the "water" condition, the second in “vodka” condition. Higher scores reflect better reactivity
Driver Water Vodka d1 16 11 d2 15 12 d3 11 10 d4 20 16 d5 19 14 d6 14 10 d7 13 11 d8 15 15 d9 14 11 d10 16 13
Looking to the last lesson (Prediction,Regression) slides, explain in a few words if some of the proposed presented statistical techniques is applicable to this problem. Explain why and in which way.
3. If t-test technique is applicable, Could we affirm with 95% of confidence that vodka had a significant effect on driving abilities. Report the t and p values that could justify this hypothesis and highlight clearly how the t value calculation process is achieved.
For more basic explanation on t-test application context and t-test computation process, you can also check some YouTube videos explaining basics, like:
Check the video, state formulas and calculation principles.
4. Use R (or some other statistics package) to perform the t-test on the same data. Next take a sample from last week's kids height and weight, analyse a sample from younger and older age group. How large a sample you would need to state with 99% certainty that the two samples difference in a) height and b) BMI.
5. State some motivation, applications and examples where and how one would use the t-test and ANOVA in real world applications. Think of very specific analysis scenarios, like the supermarket chain analysing how their customers behave or how the different regions or shops are able to sell. What would statistical testing allow you to tell about the data?
6. (Bonus, 2p) Remind yourself about the formulae about a line in space and a hyperplane in 3- or higher-dimensional space. Write the formulae for 2-D, 3-D and general n-dimensional case for a distance of a point to a plane. For 2D and 3D make a concrete example. (Motivation: 1) compare this to the linear regression - are these two minimising the same distance? and 2) later we will need these for so called linear separators, used in support vector machines (SVM), for example).