Customer Segmentation using RFM Analysis
Dataset: “online retail” under http://archive.ics.uci.edu/ml/datasets.html
RFM stands for the three dimensions:
- Recency – How recently did the customer purchase?
- Frequency – How often do they purchase?
- Monetary Value – How much do they spend?
The resulting segments can be ordered from most valuable (highest recency, frequency, and value) to least valuable (lowest recency, frequency, and value). Identifying the most valuable RFM segments can capitalize on chance relationships in the data used for this analysis.
Step 1: Data Cleaning (2 pt)
Delete all negative Quantity and Price. We also need to delete NA customer ID
Step 2: Recode variables (2 pt)
We should do some recoding and convert character variables to factors.
Step 3:Calculate RFM (3 pt)
To implement the RFM analysis, we need to further process the data set in by the following steps:
Step 3.1: Find the most recent date for each ID and calculate the days to the now or some other date, to get the Recency data
Step 3.2: Calculate the quantity of translations of a customer, to get the Frequency data
Step 3.3: Sum the amount of money a customer spent and divide it by Frequency, to get the amount per transaction on average, that is the Monetary data.
Step 4: Plot the Histograms and if skewed, use log scale to normalize (3 pt)