Homework 1 (HW01) - Introduction
Exercise 1 (EX1) (1 point) Administrative requirements
Familiarize yourself with the course and do all the necessary administrative things.
Read all the material in the courses page, definitely:
Make sure you are registered to Piazza and Gradescope (if you are not then you lose points for this task).
Exercise 2 (EX2) (1 point) Three R tutorials
For you to get started with the R programming language we have selected 5 short tutorials available here on the course homepage. For more specific functionality you should be able to find help from documentation and other available materials. If not, you are welcome to write to Piazza and ask us! For this exercise please study the contents of 3 first tutorials (Introduction to R, knitr and dplyr). With the gained knowledge please solve the following task.
Read in the abalone dataset, available here as CSV (short description of variables is here as TXT) and complete the next steps. In the PDF that you submit as your homework please show for each subtask the code and the answers. There is no need for explanatory text, except for the last subtask (j). Note that the dataset can be input with the command data <- read.csv(“path/file.csv”)
a. What are the column names of the dataset?
b. How many observations (i.e. rows) are in this data frame?
c. Print the first 4 lines from the dataset. What are the values of feature rings of the printed observations?
d. Extract the last 3 rows of the data frame. What is the weight of these abalones?
e. What is the value of diameter in the row 577?
f. What is the mean of the height column?
g. Extract the subset of rows of the data frame where gender is M and weight values are below 0.75. What is the mean of diameter in this subset? Solve this subtask without using the dplyr package.
h. Now do the same as in the previous subtask (g) but use the %>% operator from dplyr package.
i. What is the minimum of length when rings is equal to 18?
j. Is the weight of abalones related to how many rings they have? Please provide evidence and explanatory text with the conclusion. Hint: consider using the dplyr commands group_by and summarise.
Exercise 3 (EX3) (1 point) TED talks
Look at the descriptions about 10 famous TED talks on data science. Please watch at least 3 of these talks. For each of those three talks answer the following 3 questions (altogether we expect 3x3=9 answers):
a. Was it interesting to you and what did you like or didn’t like about this?
b. What was the key message that you would tell about to your friend?
c. What was the best visualisation in your opinion and why was it so appealing to you?
Please don’t forget to include the titles of the videos that you watched in your answers. Full score will be given only for comprehensive answers. For example, the answers "I liked the talk" and "I liked the talk because it was about data mining" are not good enough.
Exercise 4 (EX4) (1 point) Data for your life
Please think about how you could gather or find data and use or analyse it to improve something in your everyday life, such as related to home, friends, travel, school, work, etc. Alternatively, this idea could improve the life of someone you know personally. Please describe your idea and its potential impact in 100-300 words.