Customer Churn Prediction & Analysis for Telecom Data.
Description
The objective of the project is to build a model to predict the probability of the customer churning from the platfrom for a telecom data.
Dataset Source : This dataset is based on IBM based sample dataset obtained from Kaggle for Customer Churn.
Dimensions for the DataSet [rows vs columns]: 7043 * 21.
The Data set includes information about:
- Customers who left within the last month – the column is called Churn.
- Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies.
- Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges.
- Demographic info about customers – gender, age range, and if they have partners and dependents.
Environment
- R Studio, R (ggplot2,dplyr,randomForest,corrplot)
Tasks performed:
- Data Cleaning
- EDA
- Feature Engineering and Optimization
- Data Modeling.
- We implemented Logistic Regresion and found an accuracy of 79.01%.
- Further we tried implementing the Decision Tree classifier but did not find any improvement in our accuracy.
- Lastly we performed random forest claasifier and we observed an accuracy of 80.07%
Findings
Output:
1. The data includes almost equal proportion of males and females.
2. Almost 58% customers are on paperless billing.
3. 26% of the customers have churned from the platform.
Output:
1. Almost 40% of the customers have subscribed for the Fibre optic internet service.
2. Almost 50% of the customers have no online security and almost 45% customers have no online backup.
3. Almost 50% customers have no techsupport access and 40% have no streamingtv as a service.
4. 45% of the customers have no service of device protection.
Output:
1. Maximum number of customers have subscribered for electronic check for their payments.
2. Very less i.e approx 20% of the customers are senior citizens.
3. Equal number of customers with and without partners.
4. 65% of the customers have no dependents.
5. Almost 87% of the customers are with the phoneservice.
Output:
1. Maximum Customers churned from the platform are the one having a tenure of 0-1 years.
2. Maximum Churned customers have a Monthly charge more than $65.
Output:
* Churn rate is equally divided among the male and female customers.
* Churn rate is more among the customers with no dependents.
* Churn rate is more with customers having phone service.
* Churn rate is more with customers having paperless billing.
* Churn rate is more with customers having electronic check as the payment mode.
Output:
* Churn rate is more with customers having month to month contract.
* Churn rate is more with customers having no online security and techsupport.
* Churn rate is almost equal among the subscribers with or without the streamingtv.
Output:
With the above Logistic regression kmodel we could see an accuracy of 78% in the model.
Output:
**2.DECISION TREE**
Output:
3. Random Forest Modeling
Output:
Accuracy Comparison for the three Models
Accuracy for Logistic Model 0.7901063
Accuracy for Decision Tree Model 0.7928803
Accuracy for Random Forest Model 0.8007397
ROC analysis for the three Models
Output:
BUSINESS COST ASSUMPTION :
Output:
Results
- We successfully implemented three classification models in order to predict the potential churn customers. Considering the compares the results aforementioned business cost assumptions Random Forest Model can be considered to be a best fit model of the three implemented models with the best accuracy.
- Based upon the descriptive analysis , a greater number of Churn customers are observed in case of customers with Tenure in between 0-1 year and with increase in the Monthly Charges there is substantial increase in the Churn Rate.
- Customer on Month to Month contract are more susceptible to Churn.
- 22% of the total customers were observed with No Internet Service and thus customers are missing the exclusive services of OnlineBackup, StreamingTV, Online Security and Device Protection.