Institute of Computer Science
  1. Courses
  2. 2014/15 spring
  3. Data Mining (MTAT.03.183)
ET
Log in

Data Mining 2014/15 spring

  • Home
  • Lectures
    • Videos
  • Homeworks
    • Homework upload
  • Projects
  • Links
  • LaTeX

HW 7 (due April 5th) Visualisation, regression, clustering ...

1. Take a look at three following charts:

  • http://www.billboard.com/biz/articles/news/digital-and-mobile/5827354/the-download-hits-middle-age-and-it-shows
  • http://junkcharts.typepad.com/.a/6a00d8341e992c53ef019b021a098f970b-pi (source)
  • http://www.technologyreview.com/graphiti/520491/mobile-makeover/

Choose one of them and answer following questions:

  • What is the key message for the chart?
  • Is it easy to grasp?
  • Is it possible to improve it according to Tamara’s suggestions? How?
  • (Optional) Redraw it (either on paper or using some visualization tool) according to your suggestions.

2. Some people claim that vodka affect driving performance. An experiment is thus conducted to verify effect of vodka on emergency reaction capabilities during driving. The experimental protocol is the following one: ten drivers are each tested twice, once after having two glasses of vodka and once after having two glasses of water. The two tests were on two different days to give the vodka a chance to rip off from body. Half of the drivers were given vodka first and half were given water first. The scores of the 10 subjects are shown in the next table. The first number for each driver is their performance in the "water" condition, the second in “vodka” condition. Higher scores reflect better reactivity

Driver	Water	Vodka
d1	16	11
d2	15	12
d3	11	10
d4	20	16
d5	19	14
d6	14	10
d7	13	11
d8	15	15
d9	14	11
d10	16	13

Looking to the last lesson (Prediction,Regression) slides, explain in a few words if some of the proposed presented statistical techniques is applicable to this problem. Explain why and in which way.

3. If t-test technique is applicable, Could we affirm with 95% of confidence that vodka had a significant effect on driving abilities. Report the t and p values that could justify this hypothesis and highlight clearly how the t value calculation process is achieved.

For more basic explanation on t-test application context and t-test computation process, you can also check some YouTube videos explaining basics, like:

Check the video, state formulas and calculation principles.

4. Use R (or some other statistics package) to perform the t-test on the same data. Next take a sample from last week's kids height and weight, analyse a sample from younger and older age group. How large a sample you would need to state with 99% certainty that the two samples difference in a) height and b) BMI.

5. State some motivation, applications and examples where and how one would use the t-test and ANOVA in real world applications. Think of very specific analysis scenarios, like the supermarket chain analysing how their customers behave or how the different regions or shops are able to sell. What would statistical testing allow you to tell about the data?

6. (Bonus, 2p) Remind yourself about the formulae about a line in space and a hyperplane in 3- or higher-dimensional space. Write the formulae for 2-D, 3-D and general n-dimensional case for a distance of a point to a plane. For 2D and 3D make a concrete example. (Motivation: 1) compare this to the linear regression - are these two minimising the same distance? and 2) later we will need these for so called linear separators, used in support vector machines (SVM), for example).

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment