Segmenting an existing customer base can serve many purposes: personalize marketing programs, campaigns and communication, predict movement between segments, better customer value predictions.
How can you segment your existing customer base? Here I show you how I segmented a real three-years e-commerce customer base using Recency, Frequency and Monetary Value. I went through the following steps:
- Define segments and segmentation method
- Loading and wrangling Data
- Segmenting: assign any Customer to a Segment
- Present Data and act upon it
I used Python for the Loading, Transformation and Computing operations. You can use other data processing oriented languages like R, or maybe even a combination of SQL and Excel as well.
Define segments and segmentation method
RFM is a widely used and cost-effective behavior-based segmentation methodology. It can be built starting from the order/sales database. Contrary to other statistical approaches (i.e. k-means, clustering etc.) computational costs of RFM are low, and it can easily be applied to multi-million record databases. Due to its practicality and simplicity, I’ll then use RFM as a segmentation methodology, substituting its typical “nnn” customer ranking with a Managerial classification into pre-defined segments.
Orders and Customer Base come from the Brazilian e-commerce public dataset of 100k orders made at Olist Store in the 2016-2018 period, available at the Kaggle website. It’s real commercial data, with customer info anonymised. Here’s the Database schema for the dataset provided:

Segments Definition
RFM uses three variables: Recency, Frequency, Monetary Value in order to segment the Customer Base but every business can adapt its own business logic building different segments with different parameters. I segmented the dataset as follows:
- Inactive – customers with no orders in the last two years
- Cold – no orders in the last year, with at least one order in the last two years
- Warm – last purchase between six months and one year ago
- Active Hi Value – at least one purchase in the last six months, average order value below the overall average, more than one order overall
- Active Low Value – at least one purchase in the last six months, and average order value below the overall average, more than one order overall
- New – first and unique purchase made in the last six months

Loading and wrangling Data
Info needed to start the segmentation are:
- Customer ID
- Order Date
- Number of Orders made
- Value of purchases
Our dataset does not provide directly the info needed, thus we have to compute them transforming the original data after the extraction.
The first operation is to clean the orders table removing all the records with status “canceled”. After that we can merge the orders and customers tables:
#join (merge) the two datasets
tbl_order = pd.merge(order_items, orders_customer[['order_id','customer_id', 'order_purchase_timestamp']],
how ='inner', on ='order_id')
tbl_order = tbl_order[['order_id', 'customer_id','order_purchase_timestamp', 'price','freight_value']]
We need now to compute Recency as the number of days passed from any order date. The dataset ends at October 2018, so I used 31 Oct 18 as a baseline date. Now we can calculate our RFM base table:
# Computing RFM indicators
rfm = tbl_order.groupby('customer_id').aggregate(
{'customer_id': 'min','days_passed': ['min', 'max', 'count'], 'total_amount': ['mean','sum']
})
rfm.columns = ['customer_id', 'recency','first_purchase', 'frequency','avg_amount', 'tot_purchases']
Segmenting – assigning Customers to Segments
We are now ready to translate the segmentation business logic in python code :
# segmentation
avg_purchase=rfm['avg_amount'].mean()
rfm['segment'] = np.nan
rfm.loc[rfm['recency']>365*2, 'segment']= 'inactive'
rfm.loc[(rfm['recency']<=365*2) & (rfm['recency']>365), 'segment'] = 'cold'
rfm.loc[(rfm['recency']<=365) & (rfm['recency']>365/2), 'segment'] = 'warm'
rfm.loc[(rfm['recency']<=365/2) & (rfm['tot_purchases']>=avg_purchase), 'segment'] = 'active hi value'
rfm.loc[(rfm['recency']<=365/2) & (rfm['tot_purchases']<avg_purchase), 'segment'] = 'active low value'
#OVERWRITE SOME OF THE PREVIOUS 'ACTIVE LOW/HI VALUE"
rfm.loc[(rfm['recency']<=365/2) & (rfm['frequency']==1), 'segment'] = 'new'
Present Data and Act Upon it

Customers by Segment Total Value by Segment
Once the segmentation is completed you should start to:
- Monitor your segment mix over time and double-check any variation in its composition (i.e an increment on the percentage of inactive customers, etc)
- fine-tune your offer and campaigns segment by segment, i.e win-back campaigns aimed at “inactive” customers, welcome series aimed at “new” segment, etc.
- check and monitor variations on response rates segment by segment
- re-run segmentation at least monthly
- use segmentation to predict churn, mobility between segments, CLV Customer Lifetime Value. I will write another post on using Machine Learning to develop this kind of predictions