Prepared by Yegor Tkachenko

September 2018 - Columbia University

In this notebook we model how customers churn (stop engaging with the firm) to figure out what determines the probability of customer ‘exit’. We use logistic regression for this task. We also use the model of churn to calculate expected profits based on a version of a customer lifetime value model.

Definition

Recall that to ‘churn’ means to stop a paying relationship with a firm. So, for example, if customer had a contract and did not renew it, he churned. Also note that if \(c\) is probability of churn, then \(p=1-c\) is the probability that customer will keep transacting into the next period, that is, a retention rate or retention probability.

Data

The exercise is based on Telco customer churn data provided by IBM, which is publicly available here: https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/

However, work with the file telco_churn_data.csv that has been provided to you on edx site, as it has been modified compared to the original. We begin by reading the data from the file telco_churn_data.csv in our working directory.

options(scipen=999) # turn off scientific notation
set.seed(999) # set seed
data <- read.csv('telco_churn_data.csv', header=TRUE) # read data
data$Churn <- relevel(as.factor(data$Churn), ref = 'No') # making sure Churn == 'Yes' is coded as 1 and Churn == 'No' as zero, in binary logistic regression
head(data,5) # take a look at the data
##   gender SeniorCitizen Partner Dependents PhoneService       Contract
## 1 Female             0     Yes         No           No Month-to-month
## 2   Male             0      No         No          Yes       One year
## 3   Male             0      No         No          Yes Month-to-month
## 4   Male             0      No         No           No       One year
## 5 Female             0      No         No          Yes Month-to-month
##   PaperlessBilling MonthlyCharges Churn
## 1              Yes          29.85    No
## 2               No          56.95    No
## 3              Yes          53.85   Yes
## 4               No          42.30    No
## 5              Yes          70.70   Yes

Each row in the data corresponds to a different customer. The dependent variable here is Churn (0/1) – whether customer churned during last month, and we are trying to evaluate how different features in the data affect churn probability, including demographic information (whether the customer has a partner, is he a senior citizen) and other account information (contract type, paperless billing, monthly charges). Full variable list is as follows

colnames(data)
## [1] "gender"           "SeniorCitizen"    "Partner"          "Dependents"      
## [5] "PhoneService"     "Contract"         "PaperlessBilling" "MonthlyCharges"  
## [9] "Churn"

Logistic Regression

We now perform a logistic regression to summarize the data, regressing the churn event (0/1) on all other variables.

set.seed(999) # set seed
model <- glm(Churn ~ .,  
             data=data,  
             family=binomial(link = "logit"))
summary(model)
## 
## Call:
## glm(formula = Churn ~ ., family = binomial(link = "logit"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6224  -0.7898  -0.3120   0.9703   3.0703  
## 
## Coefficients:
##                      Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)         -1.247197   0.119876 -10.404 < 0.0000000000000002 ***
## genderMale          -0.024143   0.061339  -0.394              0.69388    
## SeniorCitizen        0.249085   0.078639   3.167              0.00154 ** 
## PartnerYes          -0.300183   0.071338  -4.208         0.0000257720 ***
## DependentsYes       -0.192841   0.085044  -2.268              0.02336 *  
## PhoneServiceYes     -0.499417   0.113088  -4.416         0.0000100467 ***
## ContractOne year    -1.735835   0.092376 -18.791 < 0.0000000000000002 ***
## ContractTwo year    -3.103087   0.153703 -20.189 < 0.0000000000000002 ***
## PaperlessBillingYes  0.389275   0.070051   5.557         0.0000000274 ***
## MonthlyCharges       0.018338   0.001391  13.183 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8143.4  on 7031  degrees of freedom
## Residual deviance: 6372.3  on 7022  degrees of freedom
## AIC: 6392.3
## 
## Number of Fisher Scoring iterations: 6

From these results we observe that, for example, increase in monthly charges is significantly positively correlated with increased probability of churn, whereas being on a contract rather than on month-to-month renewal significantly reduces churn probability.

Predictions

We can now predict the probability of churn for the data set.

data$churn_prob <- predict.glm(model, newdata=data, type="response")
hist(data$churn_prob)

library('caret') # may need to install this package first
## Loading required package: lattice
## Loading required package: ggplot2
confusionMatrix(table(1*(data$churn_prob>0.5),1*(data$Churn=='Yes')))
## Confusion Matrix and Statistics
## 
##    
##        0    1
##   0 4569 1105
##   1  594  764
##                                                
##                Accuracy : 0.7584               
##                  95% CI : (0.7482, 0.7684)     
##     No Information Rate : 0.7342               
##     P-Value [Acc > NIR] : 0.00000192           
##                                                
##                   Kappa : 0.3218               
##                                                
##  Mcnemar's Test P-Value : < 0.00000000000000022
##                                                
##             Sensitivity : 0.8850               
##             Specificity : 0.4088               
##          Pos Pred Value : 0.8053               
##          Neg Pred Value : 0.5626               
##              Prevalence : 0.7342               
##          Detection Rate : 0.6497               
##    Detection Prevalence : 0.8069               
##       Balanced Accuracy : 0.6469               
##                                                
##        'Positive' Class : 0                    
## 

Targeting

One thing we can do with this data is target individuals at high risk of churn in an email blast. This is how we would identify the individuals.

sum(data$churn_prob>0.6) # number of customers with probability of churn strictly greater than 60%
## [1] 358

Monetary value of a contract

Being on a 1-year contract significantly deters a person from churning, compared to month-to-month renewal schedule. Can we express this pressure not to churn in dollars? That is, how much extra would we need to charge a person on a 1-year contract to make him/her as likely to churn as a person on month-to-month renewal process?

-model$coefficients[7]/model$coefficients[10]
## ContractOne year 
##         94.65852

We would need to charge that person extra \(\$94.7\) per month. This amount gives us some idea about how much more valuable a customer on a contract is compared to a customer without a contract.

Increase in monthly charges and churn

After some conversations with the CEO, you feel there is a space for price increase. You decide on an immediate one-time hike in monthly charges. You want it to be the same for every customer in percentage terms relative to their current monthly charge.

In order to determine the profit-maximizing increase, you decide to perform analysis of future profits that incorporates effect of churn change due to increase in monthly charges. Note that in this type of analysis we require the critical assumption that the coefficient on the monthly charge is accurate for charge changes, rather than just charge differences between individuals. Assume that this is the case in this data set.

We first make some simplifying assumptions. We assume that customers make a decision to churn or not at the end of each discrete period. We notice that each customer in the data set that has not churned will have to pay fees for the first month with probability 1. At the end of the period the cutomer will again decide whether to churn or not, based on the experienced fee. We also assume that upon a possible one-time change in monthly charges, all other customer and account characteristics will remain fixed for lifetime for the purposes of this analysis.

Let \(c\) be the probability that customer churns, so that \(p = 1-c\) is the retention rate, i.e., the probability that customer will transact during subsequent period. Let \(\gamma\) be a discount factor (by how much money tomorrow is worth less than money today; if \(\gamma = 0.99\), then a dollar tomorrow is worth \(\$0.99\) today). Let \(m\) be customer’s monthly charge at time \(t\). Then the expected discounted profit from a customer (i.e., customer lifetime value, or CLV), based on formula for the sum of terms of a geometric progression https://en.wikipedia.org/wiki/Geometric_progression, is as follows

\[CLV = m + \gamma(1-c) m + \gamma^2(1-c)^2 m + \cdots = m + \gamma p m + \gamma^2 p^2 m + \cdots = \frac{m}{1-\gamma p}\] (Note, slight variations are possible here in when we start discounting, and whether retention rate matters for the first payment, but we will go with this formula).

Using this formula, we can calculate customer lifetime value – expected discounted profit up to the infinite time horizon – conditional on specific \(m\) (monthly charge) and customer-specific probability of retention \(p=1-c\), where we get churn probability \(c\) as a prediction from the logistic regression. Notice that \(p\) implicitly depends on \(m\) through logistic equation (because \(m\) is an input in the equation), and for \(m\) fixed, we assume \(p\) will remain constant throghout consumer lifetime.

pf<-function(incr, data)
{
  d <- data[data$Churn=="No",] # only keeping customers that have not churned yet
  d$MonthlyCharges <- d$MonthlyCharges*incr # possible increase in monthly charges (no increase is incr==1.0)
  g <- 0.99 # discount factor (money in the next period is worth 0.99 money in the period before that)
  p <- 1-predict.glm(model, newdata=d, type="response") # retention probability based on logistic regresion (we assume retention probability will remain constant for each consumer, conditional on fixed d$MonthlyCharges)
  clv <- d$MonthlyCharges/(1-p*g) # CLV formula
  return(sum(clv)) # sum of discounted profits across all individual consumers
}

Let us compute this discounted profit across the dataset under different scenarios.

pf(1.0, data) # no increase in price: 1 * charge_amount
## [1] 3587294
pf(1.5, data) # 50% increase: (1 + 0.5) * charge_amount
## [1] 3555557
pf(2, data) # 100% increase / doubling: 2 * charge_amount
## [1] 3354961
pf(3, data) # 200% increase / trippling: 3 * charge_amount
## [1] 3118843

Discounted profit optimization

Using the built-in optim function in R, we can directly optimize for discounted profit from this set of customers. We use a starting value of 1.0 (no change) for the optimal monthly charge increase, and store the optimization results in opt.

set.seed(999)
opt<-optim(1.0, pf, method="L-BFGS-B", control=list(fnscale=-1), data=data)

The optimal increase in monthly charges and the optimal profit can be extracted from the optimization result as below.

c(opt$par, opt$value)
## [1]       1.180934 3624046.159024

We find that the optimal hike in monthly charges is \(18.1\%=1.180934-1\) .

pf(opt$par, data) - pf(1.0, data)
## [1] 36752.09

We also see that the hike in charges yields extra \(\$36,752\) in customer lifetime value.