Bonus exercise on Tableau (up to 3 bonus points)
Introduction and materials
New York City Taxi & Limousine Commision publishes a monthly trip-level dataset how many times people take yellow or green taxies and for-hire-vehicles in NYC. See more here.
As per writing the assignment, the trip data is available from Jan 2009 to Jun 2017. In addition they provide you location data for taxi zone boundaries, so you can easily plot trips to the map; and a nice data dictionary for definitions.
I like NYC taxi dataset as a lot of people are using it and sharing their knowledge. So it's a great source to get inspired, learn new tools and methods
Few published examples at Tableau Public:
- https://public.tableau.com/profile/curtis.harris#!/vizhome/ChicagoTaxiRides-2H2016/ChicagoTaxiRides
- https://public.tableau.com/profile/curtis.harris#!/vizhome/TheSlowDeclineoftheNewYorkCityYellowTaxi/IVHAR
- https://public.tableau.com/profile/adrien.charles#!/vizhome/Taxi-F1/Tableaudebord2
Exploration of NYC Taxi Data with R codes
An open-source exploration of publicly available taxi and Uber data:
Kaggle competition for building a model that predicts the total ride duration of taxi trips in NYC:
The task
1. Explore a little what others have done so far with NYC taxi data.
2. Determine (write down) few broad questions or goals for your own NYС taxi data discovery.
For example, do people take taxies more in winter (is there seasonality effect)? What are the most common routes in the mornings? Do people take more taxies on rainy days? Is biking faster than taking a cab? When? Which routes? (Ah, for the last two ones, you need additional data for answering; hint it's publicly available)
3. Get the data and start your journey.
In June 2017, there were 26.5 million of taxi trips done. So you might decide to limit the dataset you are going to work. For a quick & easy start, Mirko has combined 3 datasets (yellow and green taxies, forire-vehciles) June 2017 data together with the location shapefiles: DOWNLOAD THE DATA AS TABLEAU WORKBOOK (400 MB).
Please pay attention that for-hire-vehicles only include attributes about pickup and drop off time and location, but not trip distance and costs. That might be a good start for your exploration, or you might decide to work with other periods / limit it etc.
4. We expect that you spend one hour or so playing with data and discovering findings and asking new questions.
5. If you feel insecure how to use Tableau, watch some Tableau video tutorials.
6. Document your process.
- Use either quick and dirty ("Worksheet > Copy > Image") for copy-pasting your graphs into images.
- Or use Tableau built-in feature called Story Points to document your analysis
(watch 5 min video of it) and print-to-pdf later on. Use Paper Size option "Unspecified" to fit the entire view/graphs on a page in pdf.
7. We expect that the final deliverable of your journey would be something between what you saw what others have done and what Indu Khatri, a master student from Toronto, shared publicly back in Feb 2017: https://www.kdnuggets.com/2017/02/data-science-nyc-taxi-trips.html.
It's fully up to you to decide what questions are you going to explore and how do you document it. Being a master student myself, I know that the ride is not going be smooth and with clear path, so try to embrace the journey with it's ups and downs. If you need any guidance, feel free to drop an e-mail to mirko.kand@gmail.com. Don't expect to receive a prompt reply, but I try to do my best and take time for that each night.
Grading
We will give out:
- 3 bonus points to the exceptionally good submissions,
- 2 bonus points for very good submissions, and
- 1 bonus point for good enough submissions
Here are the general guidelines that we will use in our subjective grading of this bonus task.
We will award 1 bonus point if:
- Your report clearly defines what goal(s) you have set yourself
- Your report clearly explains what you have done to achieve your goal(s)
- Your report contains at least 3 figures
- We understand what you have written
- You have spent at least 1 hour working with Tableau to achieve your goals (not including the time it takes to set up Tableau)
We will award 2 bonus points if:
- You have satisfied the requirements for 1 bonus point listed above
- Your report tells a nice story
- It contains interesting information and insights about the data, not just boring facts.
We will award 3 bonus points if:
- You have satisfied the requirements for 2 bonus points listed above
- From our point of view your report is excellent and stands out from other solutions