Forecasting Customer Lifetime Value With Machine Learning

✅ Paper Type: Free Essay	✅ Subject: Computer Science
✅ Wordcount: 4250 words	✅ Published: 08 Feb 2020

Reference this

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Abstract

With the rapid advancement over the technical infrastructure and exponential growth of data, many businesses accept machine learning, especially in predictive analytics area. But, majority of the business still had incomplete understanding over, actual possibilities of the Machine Learning. In this research paper, we present a case study on the empirical machine learning approach in predicting the lifetime value of the customer for a online retail.

The main purpose of this paper is to help the business in understanding the machine learning techniques, such as supervised algorithms and terminologies with respect to the business problem and importance of domain knowledge. It helps business to understand which algorithms are suitable with respect to problem and expected outcomes from it. Later, a set of recommendations is provided in designing customer centric marketing using machine learning.

Keywords: Machine Learning, Predictive analytics, Supervised Algorithms.

Introduction

For the past two decades, internet has revolutionised the way business used to function traditionally. Ultimately, there are numerous business outside, using internet as a means to sell products in online. Customers are center for any business.

According to the Pareto principle 80/20 rule, 80 percent of the revenue derives from 20 percent of the customers (Richard Koch, 2008).Customer Lifetime value is an important metric to measure success of any customer centric business. It also gives business people to understand the customer behavior. Namely, to what extent they have interest in products and services of their respective company. Therefore, it is an indication for firms to look in, how their business is performing.

With the evolution of internet, in the past decade online retail stores has grown enormously compared to well establish traditional stores. It not only changed how customers interact with the business, it also raised the bar and increased the sales exponentially in a short period. According to a 2018 article in e-commerce news, almost 87% of the purchases came from online retail stores, Ecommerce news (2018).

Purpose

The following were some of the distinctive features we associate with online retail in comparison to the traditional shopping at retail stores. It is easy for the business to track the customer’s behavior and activities instantly. The shopping transactional data includes, the kind of products that a customer shows interest in and purchases. It also includes the billing address and payment address for each consumer. This type of data can really enable the business to design their promotion and campaign programs accordingly.

The common points that online retail business use to have is.

What products customers have interest in, and who are the most valuable customers for the business. Since business has all set of customers but customers who purchases more or who does high billing on their website generally treated as most valuable one.
Who are most likely to churn from their service and to what extent can business can retain those customers.
Who are most loyal and trustworthy customers to the business.
How much each customer is contributing to revenue of the company.
How likely a customer can respond to promotions. Understanding the Sales and demand forecasting.

The machine learning has gained wide acceptability by the industry to gain the valuable insights from their business data towards their customers. Yet, many new retailers were entering into online market but still many had an incomplete understanding of machine learning and its possibilities in developing customer centric marketing.

The main focus of this paper is to help the business in understanding the predictive power of machine learning algorithms especially in forecasting the customer lifetime value and also recommendations on how they can design and conduct the customer centric marketing in attaining with respect to profitability.

Literature Review

Customer Lifetime value is a measurement of long term relationship between a customer and the company. It provides an insight on, how a customer is important over certain period of time. Several methods had been proposed and implemented. There are wide range of models ranging from simple to complex models. These models are suggested from different disciplines such as economics, statistics, management science so, on. With the advancement of machine learning, majority of researchers had attempted in forecasting the value with less importance on the business understanding.

A studied attempted by (Sai Laing Sain, Kun Guo., 2012) had segmented customers with an unsupervised approach for understanding the customer lifetime value. Their paper implemented modeling on the basis of RFM Model i.e Recency, frequency, and Monetary Model. Before, modeling it is important to have thorough understanding over the statistical behavior of the data, Jean-leah Njoroge., (2017).Their work were less supportive on statistical approach. Moreover., their research focussed without having of target variable. Data without target variable leads to unknown decision making in business world (Abbott Dean., 2012). It is highly recommendable to have target variable in order to get an unbiased decision making.

Supervised Techniques

According to (Benjamin Paul Chamberlain et al., (2017) their research work was focussed in predicting the lifetime value with an supervised technique. Their paper focussed on ensemble approach and neural network in forecasting customers who are likely to churn with regressors. It is unclear in their work why they arrived in picking up only two machine learning models. Since in machine learning there wide set of algorithms, it is important to pick up the models based on problem search (Sanjukta Bhowmick et., al (2006)).

UKessays.com has UK academic writers ready to help you with your essay. Order your essay today.

The customer value for an organisation can be understood in three ways Customer Profitability, Customer Equity and Life-time value. Typically customer lifetime value can help the business, to segment the customers based on using of information and lifetime value components.(Su-Yeon Kim, Tae-Soo Jung b, Eui-Ho Suh c et., al 2006). Their paper focussed majorly on theoretical management approach with less supportive to the data. In practical world, a good decision making requires data in decision making(Nan Maxwell et al., 2015). This current research paper attempts in how to choose right model for the problem and how to use machine learning to segment the data based on profitability for the business.

Dataset Description

Data Source: The data was readily available at UCI machine learning repository. It has all the transactional information of the customers that is available from December 1, 2010 to December 9, 2011. It belongs to United Kingdom based online retail store ones. The company mainly sells all unique occasion gifts. Most of the customers in the company were wholesalers.(UCI, Machine learning repository).

There is only one dataset that is available in our case, In order to build the an effective machine learning model we need to train and later test the model on unseen test data.

Here, we had one dataset, an Hold out technique can be applied in dividing the data set into 70, 30 ratio where 70% of the data can be used for training the model and rest 30% can be used to test and evaluate the model.

The dataset contains 541, 909 records of customer transactions with 7 columns.

The column field includes the following information in the transaction dataset.

Feature Name	Data Type	Description
Invoice	Nominal	It is a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter ‘c’, it indicates a cancellation.
Stock Code	Nominal	It is a 5 digit Uniquely assigned number onto Product.
Description	Nominal	Product (item) name.
Quantity	Numeric	Quantities of each product (item) per transaction
Unit Price	Numeric	Product price per unit in sterling
Invoice Date	Numeric	The day and time when each transaction was generated.
Customer ID	Nominal	It is a a 5-digit integral number uniquely assigned to each customer
Country	Nominal	Country name, It is the name of the country where each customer resides.

The dataset has 135, 080 missing values which is nearly 25% of data were missing, which is a huge drawback for the data. There are duplicate values in the dataset which needs to be treated with special attention.

The Main goal here is to predict the lifetime value of the customers using machine learning. Using Data analytics helps in understanding the features in solving the problem of our study and build the machine learning model in predicting the lifetime value of the customers with low error for an unseen data. Later, it helps in interpreting results with respect to the business problem in understanding the customers profitable relation with business. In order to proceed with the above defined steps, it is important to get a thorough understanding over the business problem.

Research Design and Methods

This study initially focuses in forecasting the lifetime value of the customers.Since, we don’t have the target feature, it is essential to design the target feature.

According to study made by Dwyer, F. R. (1997), the customer lifetime value is calculated by using below formula,

CLTV = ((A.O x P.F)/C.R) x P.M

Where,

A.O = Average Value of the Order

P.F = Purchase Frequency

C.R = Churn Rate

P.M = Profit Margin

There exists no target label for the dataset. The target feature can be designed by using the above formula.

While proceeding further in designing the target feature, we need three basic values

Avg. Value of the order
Purchase Frequency
Churn rate

An Avg.value can be calculated by using Money spent by each customer over total transaction made by the customer.

Avg.Order value = Expenditure by each customer / Total Transactions

Money spent on each customer = Quantity X Unit Price.

The total transactions value till date can be deduced from the actual transaction date.

According to a study done by the courtney Regan( 2019), the web-only retailers experiences 30% marginal profit each year, for online in store pickup experience 20%.

Let’s consider in our case as ten percent marginal profit from each customer transaction.

Purchase Frequency = Money spent by each customer transaction X 0.10
Churn Rate can be deduced based on the number of transactions from the provided period.
C.R = 1- R.R.
Where R.R = Repeat rate

Purchase frequency is calculated based on repetitive purchases made by the customer. It is calculated if customer repeats more than once only. Part of calculations were obtained from the work made by (Avinash Navalni, 2018).

Once the target feature were obtained from the dataset using ablove calculations.

New Equation

Next, we need to train and evaluate the effective machine learning algorithms based on the Mean Squared Error.Since, Means Squared error is the most common and popular evaluating technique for regression based algorithms.

The typical stages involved in the current data analytics project were shown in above chart.

In our case, here, we want to predict the customer lifetime value for a United Kingdom based online retail company. Given a transactional data of the customers our objective is to predict the customers to what extent they are going to use their service.

Data Gathering: we had transactional data, which is readily is readily available from the UCI machine Learning repository. The data contains of all the transaction information of the consumers between 2010 and 2011 (UCI Machine learning repository).
Data Cleaning: As soon as we gathered the data for analysis, it is important to clean the data. 80% of data analytics project work goes into the data cleaning (Karen Grace-Martin, 2015). Though it is more time consuming process, the outcomes of the project were highly dependent on the quality of the data. Thus data cleansing process ensures in retaining the quality in data before building the model. Initially, the paper is focussed on the United kingdom based customers only, there exists data for customers from other countries such as Germany, Iran, Spain and Belgium. Around 98% were on U.K customers rest 2% from other countries which needs to be filtered before going doing analysis on the dataset.

The dataset has 135, 080 missing values which is nothing but nearly 25% data were missing. We need to filter those data from the dataset.

Next a brief statistical summary analysis needs to be performed

If there exists any negative quantity value, those values also need to be eliminated from the dataset.

Feature Engineering: This is the key step in getting desired results for any machine learning based problem. Many beginners and even mid-level experts had an assumption that good machine learning gives accurate results, that is partly true, feature engineering is the predominant step in getting accurate results. Typical Feature Engineering steps were, Standardising, Normalising the variables and, applying encoding and binning techniques on the data in converting variables from categorical to numerical form and numerical to categorical form. In our case, we need to create new feature such as Average Order value which can be calculated by multiplying existing feature values i.e., Quantity X Price per unit, and also there is no target feature, which can be computed using CLTV formula as mentioned above.
Model Building: Predicting the future, will be defined at this stage. At this phase, Machine Learning comes into role. Typically machine learning is a program or system that trains on the input data. The trained system or model tries to make predictions on the data that is obtained from the same distribution. Set of regression models such as Linear Regression, Decision Tree, Random forest, Gradient boosting were applied in forecasting the value. Ensemble Models are expected to perform better compared to traditional models, because they have the ability in optimizing the bias and variance in the data.

Model Evaluation

In a predictive data analytics project, our goal is to make predictions on the unseen data. Once model building is done, the next important phase is to evaluate the model. Model Evaluation gives the proper understanding over performance of the model. The typical model evaluation techniques were, in our case, we are trying to predict the continuous variable. Which is a regression problem i.e customer lifetime value, is Mean Squared Error( MSE).

In Statistics, Mean Squared Error is a procedure for estimating an unobserved quantity. It is an average squared difference between the estimated values and what is estimated. It is also defined as corresponding risk function to the expected value over the squared loss (Robert Tibshirani; Trevor Hastie., 2013). In general, the models that gives low scored error values on the unseen data are treated as good models.

Feedback loop

In a typical machine learning based problems, the feedback loop plays a crucial role. If the output results were not upto the expectation, then we should focus on feature Engineering instead of directly changing the models.With the given dataset initially predictions were made, if the error is huge, let’s say we used random forest, in that tuning parameters such as number of trees and sample size needs to be tuned in order to get desired output.

The below mentioned supervised algorithms are used in getting good accurate results with proper tuning parameters.

Machine Learning Terminologies and Techniques

There are two types of machine learning algorithms one is Supervised and other is Unsupervised.

Supervised algorithms: In Supervised algorithms where data is of labeled one’s i.e., we have the target variable, train the model on the corresponding input data with respect to the target variable and make predictions on the unseen data.In this project, we are using supervised algorithms.

Unsupervised algorithms: In unsupervised approach, we don’t have the target variable we train our model and wants the algorithm to learn from the data itself.

In supervised algorithms are further classified into two types, they are regression and classification.

Regression are used to quantify the strength of relation relationship between one variable and the variables that are thought to explain in it.

In Classification, objective is to train the model in predicting the discrete values i.e categorical Target values. In our problem Initially we are trying to study set of machine learning algorithms subjected to our case. Our’s is an regression problem, we are trying to predict the continuous value.

Linear Regression

Linear Regression is the first and foremost predictive algorithm. There are many types of regression techniques. For, intuitive understanding let’s see the simple Linear regression. This is a model that can show the relationship between two variables. More specifically it shows the variation in the dependent variables can be captured by the change in the independent variables. In the business context the dependent variable can also be called as predictor variable or sales of a product, performance, pricing or risk etc.The independent Variables also called as explanatory variables, explains the influence of the dependent variable. A Simple Linear regression model is linear because all the terms in the model are either constant value or the parameter that is multiplied by the independent variable. The core idea is to find out the model.Which, when plotted is the line of best fit for the data. (Robert Tibshirani, Trevor Hastie ., 2013)

Decision Trees

Decision trees are non-parametric techniques which forms a tree like structure.It uses three types of mathematical formulas in selecting the nodes of a tree Gini Index, Entropy and Enthalpy.

Generally Entropy defines purity of a feature with respect to target variable. Root node of a decision tree requires us to pick up feature with less entropy.

But, picking up feature with respect entropy contribute less power to model. Hence, Gini Index powers the model in understanding true contribution of feature by inculcating information gain.

Artificial Neural Networks

These are one of the supervised algorithms which helps in solving of both regression and classification problems. Neural Networks algorithms were inspired from human brain functionality.

In fig.1, consists simple artificial Neural Network with one input, output layer and hidden layer. Number of neurons on input layer is equal to number of input features in the data. Hidden layers and number of neurons were the tuning parameters of the model. We compute the given input values by applying linear summation and multiplying with weights and bias. The output values passes into activation function results outcome of the instance. (Tom Mitchell., 1997)

Random Forest

Random Forest are popular ensemble technique. The major challenge for any machine learning problem is to maintain low bias and low-variance. Random Forest is a supervised Machine Learning algorithm which is built by picking up features randomly from the data and builts decision trees in making predictions on classification and regression problems.

The main tuning parameters with Random Forest were n_estimators and max_depth. Since a forest is built on picking up features from the data. It is important to set how many features, it should pick randomly and how deep it should build the tree. A deep decision trees are prone to overfitting problem.

Gradient Boosting

Boosting is an ensemble technique, it selects the predictors sequentially rather than independently. This technique helps us in solving both regression and classification problems. Boosting techniques generally makes weak learning model turn into strong learners. Main idea behind this algorithm is to focus more on residuals and modify the model to predict residuals of the previous models. In the end, we combine all the predictors by giving some weights to each predictor.

You can execute parameter tuning in gradient boosting algorithm in two different levels,

Tree based parameters:
- Min_samples_split – minimum no.of samples required in a node to split.
- Min_samples_leaf – minimum samples required in leaf node.
- Max_depth – max depth of a tree.
- Max_features – number of features to consider while searching for best split.
Boosting Parameters:
- Learning_rate – determines the impact of each tree in the output.
- N_estimators – number of sequential trees to model.
- Subsample – fraction of observation to select for each tree.
Miscellaneous Parameters:
- Random_state – random number seed
- Warm_start – to add additional trees to the previous fit of the model

Anticipated Results: Describe the anticipated results of the study

This study gives complete practical working functionalities of machine learning models on the transactional data. It helps how important domain knowledge is, especially when designing target feature from the independent variables. As the study proceeds, picking up right model on the basis of interpretability and accuracy were expected from this study. Ensemble Models are anticipated in getting accurate predictions, but traditional models had good potential in terms of interpretability. These techniques, helps the business in developing campaign programs based on the profitability of consumers to firm.

Once the forecasting were made the output will be sorted in descending order. So, arranging in descending helps business to see value from maximum to minimum. Later, segmentation can be applied based on the setting up threshold, which can be derived from classification models such as logistic regression. Later, in depth research study can be done on the segmented data for the business.

Detail the impact and improvement results of the study.

The previous studies were attempted individually in using machine learning in forecasting customers lifetime value, and some are done on the basis of theoretical assumptions, and few are chosen statistical, economic models in forecasting without justification on the data. This study helps business in picking up the right model on right data. It shows the business how they can design the marketing programs over the specific customers in order to prolong their value to the customers with in depth analysis.

Highlight the use of visualizations tools if any:

Initially for data exploration this study needs visualizations such as bar charts, scatter plot and heat maps for performing Univariate, Bivariate Analysis and more than three variable analysis. We expect apart that time series graph in this study.

Discussions:

This study basically aims at business to get good understanding over the possibilities of Machine Learning. Typically before applying any machine learning algorithm in solving a problem. It is important to know Upside and downsides of these algorithms. We can’t directly apply any technique on the problem right away. It is very important to have a good understanding over the domain and the data that we are going to work on.

Through this study, we are trying to predict the customer Lifetime value. We don’t have that target value. In order to know how to compute the value itself, shows the role of domain knowledge in tackling the problem. Having domain knowledge is my first learning.

Secondly, we list an array of machine learning algorithms, and set our initial problem on regression. But the algorithms such as Neural Network can’t work effectively because, those are data hungry needs large amount of data. (imp of data over technique)

Third, as initially we are helping the business in designing marketing campaigns, this research work initially looks profitability aspect based on customer lifetime value. Apart from that, it also needs other elements, because profit is one part of the game.

Main insights we gain from this study were, it provides complete understanding on machine learning capabilities in working on less data. How important domain knowledge in our problem are major insights from this study.

The common tools in use are Microsoft Excel, Python, Pandas, Numpy, Scikit-learn and Google Colab. For visualisation it is advisable to go for Tableau, Matplotlib and ggplot2.

Key Challenges

Nearly one quarter of the data was missing. Though we had the central tendency techniques such as Mean, Median, Mode. But the data set is missing Unique Id, which we can’t be replace using the above techniques.
Very few features are available. Since Further deep analysis is essential for this project, we need to obtain the majority of the features in a derived form.

Recommendations & Future Study

In coming semesters, this project focuses:

on justification;
over why ensemble models performs good in predicting continuous value merit further study and;
how to do customer segmentation in terms of profitability using machine learning.

As the data has limitations due to heavy missing values. We’ll apply techniques such as Monte-carlo Simulation and probabilistic based simulations in overcoming that problem. As the project requires batch learning, we can apply online machine learning techniques in order to overcome the human intervention in a timely manner.

References

Su-Yeon Kim a, Tae-Soo Jung b, Eui-Ho Suh c, Hyun-Seok Hwang (2006), Title: Customer segmentation and strategy development based on customer lifetime value: A case study.

Lars Kotthoff a, Ian P. Gent a and Ian Miguel (2010), Title: An Evaluation of Machine Learning in Algorithm Selection for Search Problems.
Daqing Chen, Sai Laing Sain, Kun Guo(2012), Title: Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining.
Sanjukta Bhowmick, Victor Eijkhout, Yoav Freund, Erika Fuentes, and David Keyes. Application of machine learning in selecting sparse linear solvers. Technical report, Columbia University, 2006.
Robert Tibshirani Trevor Hastie “Introduction to Statistical Learning”: Textbook originally published in 2013.
Tom.M.Mitchell “Machine Learning”: Textbook originally published in 1997.
Dwyer, F. R. (1997). Customer lifetime valuation to support marketing decision making. Journal of Interactive Marketing, 11(4), 6–13.
Benjamin Paul Chamberlain et al (2017), Title: Customer Lifetime Value Prediction Using Embeddings.Retrieved from https://arxiv.org/pdf/1703.02596.pdf
Jean-leah Njoroge., (2017). Title:Significance of Exploratory Data Analysis, Retrieved from https://www.jeannjoroge.com/significance-of-exploratory-data-anaysis/
Abbot Dean., (2012).Title:Data Mining and Predictive Analytics, Retrieved from https://abbottanalytics.blogspot.com/2012/04/why-defining-target-variable-in.html
NAN L. MAXWELL et., al (2015) Title: Data and Decision Making: Same Organizations different perceptions. Retrieved from https://redf.org/app/uploads/2015/02/Data_Decision_Making_WP.pdf
Richard Koch (2008), Title: The-8020-Principle. Extracted from https://richardkoch.net/2012/11/the-8020-principle-2/
Ecommerce news (2018), Title: 87% of UK retail purchases made online Retrieved from https://ecommercenews.eu/87-of-uk-retail-purchases-made-online/
UCI Machine Learning Repository, Retrieved from https://archive.ics.uci.edu/ml/datasets/online+retail
Avinash Navlani(2018), Title Customer Lifetime Value, Retrieved from: https://www.datacamp.com/community/tutorials/customer-life-time-value
Kare Grace-Martin (2015) Title: Preparing Data for Analysis is (more than) Half the Battle. Retrieved from https://www.theanalysisfactor.com/preparing-data-analysis/
Courtney Rega, (2019) Title: running retail stores more expensive than online. Retrieved from :https://www.cnbc.com/2017/04/19/think-running-retail-stores-is-more-expensive-than-selling-online-think-again.html

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing stye below:

Related Services

View all

Essay Writing Service

From £99

Report Writing Service

From £99

Student reading and using laptop to study

Assignment Writing Service

From £99

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please click the following link to email our support team:

Request essay removal

Abstract

Introduction

Purpose

Literature Review

Supervised Techniques

Dataset Description

Feature Name

Research Design and Methods

Avg.Order value = Expenditure by each customer / Total Transactions

New Equation

Next a brief statistical summary analysis needs to be performed

Model Evaluation

Feedback loop

Machine Learning Terminologies and Techniques

Linear Regression

Decision Trees

Artificial Neural Networks

Random Forest

Gradient Boosting

Anticipated Results: Describe the anticipated results of the study

Detail the impact and improvement results of the study.

Discussions:

Key Challenges

Recommendations & Future Study

References

Cite This Work

Related Services

DMCA / Removal Request