Arvutiteaduse instituut
  1. Kursused
  2. 2016/17 kevad
  3. Andmekaeve (MTAT.03.183)
EN
Logi sisse

Andmekaeve 2016/17 kevad

  • Home
  • Lectures
  • Homeworks
    • Submit
  • Projects
  • Software
  • Links

HW1. The beginning

This is the first and more like an introductory homework. Every task (unless marked differently) is worth 1 point.

EX1. Familiarize yourself with the course and do all the necessary administrative things.

  • Read all the material in the courses page, definitely:
    • GENERAL INFO
    • HOMEWORK FORMAT
    • ABOUT PLAGIARISM
    • SUBMISSION RULES
    • SOFTWARE, especially about HOW TO CHOOSE LANGUAGE
  • Make sure you are registered to Piazza (if you are not then you lose points for this task).

Next two exercises are about trying out Python and R and doing some trivial data manipulation with them. This is so you can try out both and decide on what kind of setup you would like to use during the course. Also if you already know what you want to use, it is still good to know basics of both.

Since you have to submit only one report, then you either have to include your R code in your Python report or your Python code in your R report. Since the tasks are easy, you don't need to show the output of both languages, for the "second" language you can just show the code.

EX2. Consult the R tutorial (or any other material about R) if needed and complete the following tasks in R. First, set up the working environment. Then read in the abalone dataset (short description of variables is in here) and complete the next steps (show the code and the answers, no interpretation needed this time):

a. What are the column names of the dataset?
b. How many observations (i.e. rows) are in this data frame?
c. Print the first 4 lines from the dataset. What are the values of feature rings of the printed observations?
d. Extract the last 3 rows of the data frame. What is the weight of these abalones?
e. What is the value of diameter in the row 755?
f. How many missing values are in the height column?
g. What is the mean of the height column? Exclude missing values from this calculation.
h. Extract the subset of rows of the data frame where gender is M and weight values are below 0.75. What is the mean of diameter in this subset?
i. What is the most frequent rings value?
j. What is the minimum of length when rings is equal to 18?

EX3. Consult the Python tutorial (or any other material about Python) if needed and complete the same task as before (in ex 2) in Python. Comment on what you thought about R and Python and what do you think will be your working environment for the rest of the course.

EX4. Read and analyze the following blog post. Select two of the worst visualizations (in your opinion) and comment why are they worst. Then select two of the best/most interesting visualizations (in your opinion) and comment on what is good about them.

EX5. We will need to repeat some probability theory. Read the first chapter on Probability theory from MathWiki web-site: http://mathwiki.cs.ut.ee/start. We recommend that you solve all the given exercises for training purposes. Play with the simulation of a dice. Explain how the increase of rolling times changes the probability distribution of the dice.

Solve the following tasks from Math wiki, describe it in detail (not just an answer) and make sure you can explain them for the audience:

A company makes computer discs. It tested a random sample of discs from a large batch and found that the probability of any disc being defective is 0.025. Bob buys two discs. Calculate the probability that

  1. both discs are defective;
  2. that only one disc is defective.
  3. The company found 4 defective discs in the sample they tested. How many discs were likely tested?
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused