This project predicts customer churn using an artificial neural network (ANN) model trained on a dataset with features such as geography, gender, age, balance, credit score, tenure, number of products, has credit card, isActiveMember, estimated salary, and exit status. After thorough preprocessing, including feature scaling and encoding, the ANN model was developed to classify customers as likely to churn or retain. The solution is deployed on Streamlit Cloud, providing a user-friendly interface for real-time churn predictions, enabling businesses to implement proactive retention strategies.
- Project Overview
- Generated project structure
- Dataset Information
- Project Workflow
- Installation
- Usage
- Model Development
- Evaluation
- Contributing
- License
Problem Statement: Predict whether a customer is likely to churn and estimate their salary.
Goal: By estimating customer salary, the bank can target potential customers for loans, increasing business opportunities. Predicting customer churn enables the bank to take proactive measures to retain customers and reduce churn rates.
Solution: An Artificial Neural Network (ANN) model is used to train and predict both churn likelihood and customer salary
project_name/ │ ├── data/ # Data-related files │ ├── raw/ # Raw, unprocessed data files │ └── processed/ # Processed data files (if applicable in the future) │ ├── notebooks/ # Jupyter notebooks for EDA and experimentation │ ├── experiments.ipynb # Experimentation notebook │ ├── prediction.ipynb # Prediction notebook │ └── salaryregression.ipynb # Salary regression notebook │ ├── pickle/ # Pickle files for preprocessing │ ├── label_encoder_gender.pkl # Label encoder for gender │ ├── onehot_encoder_geo.pkl # One-hot encoder for geographical data │ └── scaler.pkl # Scaler for normalization │ ├── logs/ # Classification specific logs │ ├── train/ # Training logs │ └── validation/ # Validation logs │ ├── regression_logs/ # Regression-specific logs │ ├── train/ # Training regression logs │ └── validation/ # Validation regression logs │ ├── models/ # Saved models │ ├── models.h5 # Classification model file │ └── regression_model.h5 # Regression model file │ ├── reports/ # Reports and visualizations │ └── figures/ # Plots and visualizations │ ├── app.py # Main application script ├── streamlit_regression.py # Streamlit app for the regression model ├── requirements.txt # Dependencies and libraries ├── README.md # Project overview └── .gitignore # Git ignore file
Credit Score: The customer's credit score, which reflects their creditworthiness.
Geography: The geographic region where the customer resides (Spain, France, Germany).
Gender: The gender of the customer (Male or Female).
Age: The age of the customer.
Tenure: The length of time the customer has been with the bank.
Balance: The current bank balance of the customer.
Number of Products: The total number of products the customer uses with the bank.
Has Credit Card: Whether or not the customer holds a credit card with the bank.
Is Active Member: Indicates how actively the customer is using the bank's services.
Estimated Salary: The estimated annual salary of the customer.
Exited: Whether or not the customer is likely to churn (exit the bank).
-
Data Collection:
The dataset was obtained from Kaggle as the primary resource. -
Data Preprocessing:
Data preprocessing was performed using the Pandas library in Python, which involved cleaning and transforming the data. -
Feature Engineering:
Relevant features were selected and engineered to enhance model performance. -
Model Selection:
Artificial Neural Networks (ANN) were chosen as the model for predicting customer churn and estimating salary. -
Model Training and Optimization:
The ANN model was trained using the preprocessed data, followed by optimization to improve accuracy and performance. -
Prediction:
The model predicts both the estimated salary of the customer and whether the customer is likely to churn. -
Deployment:
The model was deployed on Streamlit Cloud for interactive real-time predictions, and the code was checked into GitHub for version control and contribution tracking.
Provide the steps for setting the environment
git clone https: https://github.com/saichakka10/ANN-Churn-Prediction.git
cd ANN-Churn-Prediction
conda create -p venv python==3.11 -y
conda activate venv
pip install -r requirements.txt
conda deactivate
The experiments.ipynb
file is used for data cleaning, preprocessing, splitting the dataset into training and testing sets, and implementing an Artificial Neural Network (ANN) for the classification task. The model predicts whether the customer is likely to churn or not.
The prediction.ipynb
file is used to make predictions based on the trained classification model. It outputs the likelihood of customer churn.
The salaryregression.ipynb
file is used for data cleaning, preprocessing, splitting the dataset into training and testing sets, and implementing an ANN model for regression. This model predicts the estimated salary of the customer.
-
Classification:
Accuracy is used as the evaluation metric to measure how well the model classifies customer churn. -
Regression:
The Mean Absolute Error (MAE) is used to evaluate the regression model's performance in predicting the customer's salary.
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch
- git commit -m 'Add feature'
- git push origin feature-branch
- This version is structured with better clarity and consistency, making it easier to read and follow the steps.
The deployment of the project has been done on Streamlit Cloud for enhanced user interaction. The following files are used in the deployment:
-
app.py
: This file serves as the main application that provides an interactive user interface. It facilitates the prediction of customer churn, allowing users to input data and view results in real-time. -
streamlit_regression.py
: This file specifically handles the regression model used for predicting the customer’s estimated salary. It ensures that users can interact with the model and get immediate predictions.
Streamlit Cloud is a cloud-based platform that simplifies the process of deploying and sharing machine learning models as interactive web apps. Streamlit Cloud allows developers to deploy apps directly from a GitHub repository without the need to manage infrastructure. It automatically updates whenever the repository is updated, streamlining deployment and making it easy to share your projects with collaborators or stakeholders.
- Interactive Interface: Streamlit provides an easy-to-use interface to display data, plots, and real-time predictions.
- Quick Deployment: With minimal setup, you can deploy your app directly from GitHub, making it accessible to others instantly.
- Automatic Updates: When changes are made to the codebase or GitHub repository, the app is automatically updated without requiring manual intervention.
- Easy Sharing: You can share your app with others by simply providing a link, making it ideal for presentations and collaborations.
This project is licensed under the MIT License