HW03 (28.02) - Descriptive statistics (KDE, boxplots)
1. Kernel density estimation: use these two data sets - klient1.txt and klient3.txt. Plot the density distribution of these two data sets. This represents the time of week, when two different groups of people go shopping over the entire year. Choose an informative kernel width and justify your choice. Characterise briefly these two data sets (klient1 and klient3)?
2. Study the data Attach:product_time_shop.txt. There is a information about a number of shops and times when particular products (items) sold through the week. Describe the data - what products, shops, how many purchases of different products in different shops?
3. Draw boxplots that would allow comparing different weekdays, shops, and product sales. Identify some meaningful illustrations to draw conclusions about 1) different weekdays, 2) different products , 3) shops. State your hypothesis and then draw respective analysis of data.
4. Use the same data as in 2. Explore the data and identify if any of the shops has run out of any popular product during the day (which shops, products, days?). Draw the density plots to convince the reader or shop manager. Formulate the principles of an automated procedure to identify (all) such events.
5. Calculate the nr of purchases of each product during every day and every hour in each shop. Make a table with a product sold in that day in particular shop as rows (one row for every day) and time by hour as columns; In each cell a nr of purchases in that hour. Draw a heatmap version of the sales data. E.g. use Excel "Conditional formatting" => "Color Scales"
6. Bonus (2p). Clearly, people visit shops more frequently at certain times. This can obscure analysis.
- describe the overall visiting behavior of customers based on data from 2.
- normalise frequency of purchases to reflect a relative share of purchasing that product. (state how)
- identify if different products are purchased in different relative frequency over the weekdays
- does this normalisation help in task 4, to identify when shop has ran out of products?