First steps of the project
Deadline: Monday, Dec 2, at noon (12:00)
The goal of this homework is to get you started with your project. Make sure that you know all the requirements for projects. Task 1 asks you to set up a repository. Tasks 2 and 3 are about the first two steps in CRISP-DM: business and data understanding. Task 4 asks you to make a work plan.
This homework should be done in project teams not individually. All members of the team will receive the same number of points for this homework. Only one project team member has to fill out the Google Form with the project information.
Task 1. Setting up (0.25 points)
- Create a project repository, either in GitHub or Bitbucket, if You have not done this already.
- Make sure that instructors have access to the repository. Invite us by using our usernames. If that does not work, invite us by our e-mail.
- Markus Haug - haugmarkus (GitHub) and (Bitbucket)
- Friedrich Krull - kr3ca (Github) and kr3ca (Bitbucket)
- Carel Kuusk - CarelKuusk (GitHub) and (Bitbucket)
- Victor Pinheiro - victorhenriquecp (GitHub) and victorhenriquecp (Bitbucket)
- Hasan Tanvir - hasantanvir79 (GitHub) and (Bitbucket)
- Add reports from the following subtasks (business understanding, data understanding and planning) to the repository as a single separate PDF file named GROUP_NR_report.pdf (e.g. "A0_report.pdf").
- Add the link of the repository also to the report.
Task 2. Business understanding (0.5 point)
NB! Don't forget to mention your project title and team members at the beginning of the report.
Developing a business understanding within CRISP-DM consists of four tasks: identifying your business goals, assessing your situation, defining your data-analysis, data-mining or machine learning goals and producing your project plan. For this exercise, please develop a business understanding of your project. According to CRISP-DM, you should report the following:
- Identifying your business goals
- Background
- Business goals
- Business success criteria
- Assessing your situation
- Inventory of resources
- Requirements, assumptions, and constraints
- Risks and contingencies
- Terminology
- Costs and benefits
- Defining your data-mining goals
- Data-mining goals
- Data-mining success criteria
Please follow this given structure and cover all these aspects in your report. Consult this PDF-file with a chapter on Embracing the Data-Mining Process for more information on each of the deliverables. Keep the report concise, and feel free to state that some aspect is irrelevant to your project. If your project does not benefit a ‘business’, please specify who will benefit from it and perform business understanding from their perspective. For instance, this could be either one or multiple individuals, organizations, or societies. Please focus on the goals you plan to contribute directly, not the generic ones (like making the world better).
The report of task 2 should be 400-800 words.
Task 3. Data understanding (1 points)
Data understanding within CRISP-DM consists of performing four tasks: gathering, describing, exploring, and verifying data quality. For this exercise, please develop a data understanding of your project. Report the results of the tasks according to the following structure:
- Gathering data
- Outline data requirements
- Verify data availability
- Define selection criteria
- Describing data
- Exploring data
- Verifying data quality
Consult the above-given book chapter to understand what is expected under all these deliverables. Take inspiration from when describing and exploring the data. As a result of this exercise, you should have gathered and understood the data. You should have decided which parts of the data you will use and understood the meaning of all fields within these parts. Note that data cleaning is part of the data preparation step in CRISP-DM, but you might choose to do some of it already during this task.
The report of task 3 should be 400-800 words.
Task 4. Planning your project (0.25 points)
Please perform the following tasks:
- Make a detailed plan of your project with a list of tasks. There should be at least five tasks. Specify how many hours each team member will contribute to each task.
- List the methods and tools that you plan to use. Add any comments about the tasks that you think are important to clarify.
The report of task 4 should be 100-300 words.
- One project team member has to fill out the Google Form with the project information.
- Google Form