First steps of the project
The goal of this homework is to get you started with your project. Make sure that you know all the requirements for projects. Task 1 and Task 2 are about the first two steps in CRISP-DM: business understanding and data understanding. The third exercise asks you to set up your project and make a work plan.
This homework should be done in project teams not individually and each project team should make only one submission. All members of the team will receive the same number of points for this homework.
Task 1. Business understanding (1 point)
NB! Don't forget to mention your project title and team members at the beginning of the report.
Developing a business understanding within CRISP-DM consists of four tasks: identifying your business goals, assessing your situation, defining your data-analysis, data-mining or machine learning goals and producing your project plan. For this exercise, please, develop a business understanding of your project. According to CRISP-DM, you should report the following:
- Identifying your business goals
- Background
- Business goals
- Business success criteria
- Assessing your situation
- Inventory of resources
- Requirements, assumptions, and constraints
- Risks and contingencies
- Terminology
- Costs and benefits
- Defining your data-mining goals
- Data-mining goals
- Data-mining success criteria
Please, follow this given structure and cover all these aspects in your report. Consult this PDF-file with the chapter on Embracing the Data-Mining Process for more information on each of the deliverables. Keep the report concise and feel free to state that some aspect is not relevant in your project. If your project is not meant to benefit a ‘business’, then please specify who will benefit from the project and perform business understanding from their perspective. For instance, this could be either one or multiple individuals, organizations, or societies.
Task 2. Data understanding (2 points)
Data understanding within CRISP-DM consists of performing four tasks: gathering data, describing data, exploring data and verifying data quality. For this exercise please develop a data understanding of your project. Report the results of the tasks according to the following structure:
- Gathering data
- Outline data requirements
- Verify data availability
- Define selection criteria
- Describing data
- Exploring data
- Verifying data quality
Consult the given book chapter to understand what is expected under all these deliverables. Take inspiration from when describing and exploring the data. As a result of this exercise, you should have gathered and understood the data. You should have decided which parts of the data you are potentially going to use and understood the meaning of all fields within these parts. Note that data cleaning is part of the data preparation step in CRISP-DM but you might choose to do some of it already during this task.
Task 3. Setting up and planning your project (1 point)
Please perform the following tasks:
- Create a project repository either in GitHub or Bitbucket.
- Register your project by adding a new entry into the List of projects. Please follow the instructions given there on slide 2, this helps to keep the list tidy. In your homework report include a direct link to the slide with your project (can be copied from the address bar of your web browser). Add this link to the front page of your project repository as well.
- Make a detailed plan of your project with a list of tasks. Specify how many hours each team member is going to contribute to each task.
- Add the results from business understanding, data understanding and planning to your project repository. Report the links to where these results are listed.
- Prepare to pitch your project. The slide of your project within the List of projects will be shown and you will be given 90 seconds to explain to others what your project is about and what your plan is. This will be followed by 90 seconds of questions and discussion about your project. Feel free to add a visualization to your slide if you think this helps during the presentation or an extra slide into the list of projects after your slide..