HW01 (14.02) - Introduction, probability ...
1. Read the Todd Schneider Nov 17th 2015 blog post: "Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance" http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/
2. Identify and present as a concise list various technical data analysis and visualisation methods used for the above research. Make first a list. And follow that with an extended list with an example sentence or illustration from that analysis.
3. Identify the business value of potential data mining and analysis of the data of this type. Could you estimate the value in $$$ for Uber or NYC if some actions would follow this analysis?
4. We will need to repeat some probability theory. Read the first chapter on Probability theory from MathWiki web-site''' : http://mathwiki.cs.ut.ee/start. We recommend that you solve all the given exercises for training purposes. Play with the simulation of a dice. Explain how the increase of rolling times changes the probability distribution of the dice.
Solve the following tasks from Math wiki, describe it in detail (not just an answer) and make sure you can explain them for the audience:
A company makes computer discs. It tested a random sample of discs from a large batch and found that the probability of any disc being defective is 0,025. Bob buys two discs. Calculated the probability that
- both discs are defective;
- that only one disc is defective.
- The company found 4 defective discs in the sample they tested. How many discs were likely tested?
5. At the exam there is 0.8 probability that student has prepared and 0.2 that he has not prepared. Those who are prepared have 0.7 probability of success, those who have not prepared have 0.4 probability of success. What is the probability that randomly selected student will succeed?
6. (Bonus(1p) Run a computational experiment by generating data and simulating tasks from 4 and 5. Report your findings. Prove computationally that theory holds.