Customer Churn Prediction & Analysis for Telecom Data.

6 minute read

Description

The objective of the project is to build a model to predict the probability of the customer churning from the platfrom for a telecom data.
Dataset Source : This dataset is based on IBM based sample dataset obtained from Kaggle for Customer Churn.
Dimensions for the DataSet [rows vs columns]: 7043 * 21.
The Data set includes information about:

  • Customers who left within the last month – the column is called Churn.
  • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies.
  • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges.
  • Demographic info about customers – gender, age range, and if they have partners and dependents.

Environment

  • R Studio, R (ggplot2,dplyr,randomForest,corrplot)

Tasks performed:

  • Data Cleaning
  • EDA
  • Feature Engineering and Optimization
  • Data Modeling.
  • We implemented Logistic Regresion and found an accuracy of 79.01%.
  • Further we tried implementing the Decision Tree classifier but did not find any improvement in our accuracy.
  • Lastly we performed random forest claasifier and we observed an accuracy of 80.07%

Findings

Output:
png

1. The data includes almost equal proportion of males and females.
2. Almost 58% customers are on paperless billing.
3. 26% of the customers have churned from the platform.

Output:
png

1. Almost 40% of the customers have subscribed for the Fibre optic internet service.
2. Almost 50% of the customers have no online security and almost 45% customers have no online backup.
3. Almost 50% customers have no techsupport access and 40% have no streamingtv as a service.
4. 45% of the customers have no service of device protection.

Output:
png

1. Maximum number of customers have subscribered for electronic check for their payments.
2. Very less i.e approx 20% of the customers are senior citizens.
3. Equal number of customers with and without partners.
4. 65% of the customers have no dependents.
5. Almost 87% of the customers are with the phoneservice.

Output:
png
png

1. Maximum Customers churned from the platform are the one having a tenure of 0-1 years.
2. Maximum Churned customers have a Monthly charge more than $65.

Output:
png

* Churn rate is equally divided among the male and female customers.
* Churn rate is more among the customers with no dependents.
* Churn rate is more with customers having phone service.
* Churn rate is more with customers having paperless billing.
* Churn rate is more with customers having electronic check as the payment mode.

Output:
png
* Churn rate is more with customers having month to month contract.
* Churn rate is more with customers having no online security and techsupport.
* Churn rate is almost equal among the subscribers with or without the streamingtv.

Output:
png
With the above Logistic regression kmodel we could see an accuracy of 78% in the model.

Output:
png

**2.DECISION TREE**

Output:
png

3. Random Forest Modeling

Output:
png

Accuracy Comparison for the three Models

Accuracy for Logistic Model 0.7901063
Accuracy for Decision Tree Model 0.7928803
Accuracy for Random Forest Model 0.8007397

ROC analysis for the three Models

Output:
png

BUSINESS COST ASSUMPTION :

Output:
png

Results

  • We successfully implemented three classification models in order to predict the potential churn customers. Considering the compares the results aforementioned business cost assumptions Random Forest Model can be considered to be a best fit model of the three implemented models with the best accuracy.
  • Based upon the descriptive analysis , a greater number of Churn customers are observed in case of customers with Tenure in between 0-1 year and with increase in the Monthly Charges there is substantial increase in the Churn Rate.
  • Customer on Month to Month contract are more susceptible to Churn.
  • 22% of the total customers were observed with No Internet Service and thus customers are missing the exclusive services of OnlineBackup, StreamingTV, Online Security and Device Protection.

Updated: