HW1. The beginning
This is the first and more like an introductory homework. Every task (unless marked differently) is worth 1 point.
EX1. Familiarize yourself with the course and do all the necessary administrative things.
- Read all the material in the courses page, definitely:
- Make sure you are registered to Piazza (if you are not then you lose points for this task).
Next two exercises are about trying out Python and R and doing some trivial data manipulation with them. This is so you can try out both and decide on what kind of setup you would like to use during the course. Also if you already know what you want to use, it is still good to know basics of both.
Since you have to submit only one report, then you either have to include your R code in your Python report or your Python code in your R report. Since the tasks are easy, you don't need to show the output of both languages, for the "second" language you can just show the code.
EX2. Consult the R tutorial (or any other material about R) if needed and complete the following tasks in R. First, set up the working environment. Then read in the abalone dataset (short description of variables is in here) and complete the next steps (show the code and the answers, no interpretation needed this time):
EX3. Consult the Python tutorial (or any other material about Python) if needed and complete the same task as before (in ex 2) in Python. Comment on what you thought about R and Python and what do you think will be your working environment for the rest of the course.
EX4. Read and analyze the following blog post. Select two of the worst visualizations (in your opinion) and comment why are they worst. Then select two of the best/most interesting visualizations (in your opinion) and comment on what is good about them.
EX5. We will need to repeat some probability theory. Read the first chapter on Probability theory from MathWiki web-site: http://mathwiki.cs.ut.ee/start. We recommend that you solve all the given exercises for training purposes. Play with the simulation of a dice. Explain how the increase of rolling times changes the probability distribution of the dice.
Solve the following tasks from Math wiki, describe it in detail (not just an answer) and make sure you can explain them for the audience:
A company makes computer discs. It tested a random sample of discs from a large batch and found that the probability of any disc being defective is 0.025. Bob buys two discs. Calculate the probability that
- both discs are defective;
- that only one disc is defective.
- The company found 4 defective discs in the sample they tested. How many discs were likely tested?