Credit Card Default Detection

license	title	sdk	emoji	colorFrom	colorTo	app_file	pinned
mit	Credit Card Defaults Prediction	streamlit	🦀	green	indigo	streamlit_app.py	false

Credit Card Default Detection

Objective

This project focuses on building a credit card fault detection model using machine learning techniques. The model aims to predict default payments based on various demographic and credit-related features.

Dataset

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

For more information on the dataset, please visit the UCI ML Repository https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients

or Kaggle website at https://www.kaggle.com/code/selener/prediction-of-credit-card-default/input

Machine Learning Pipeline

Analyze Data: In this initial step, we attempted to comprehend the data and searched for various available features. We looked for things like the shape of the data, the data types of each feature, a statistical summary, etc. at this stage.
EDA: EDA stands for Exploratory Data Analysis. It is a process of analyzing and understanding the data. The goal of EDA is to gain insights into the data, identify patterns, and discover relationships and trends. It helps to identify outliers, missing values, and any other issues that may affect the analysis and modeling of the data.
Data Cleaning: Data cleaning is the process of identifying and correcting or removing inaccuracies, inconsistencies, and handling missing values in a dataset. We inspected the dataset for duplicate values. The null value and outlier detection and treatment followed. For the imputation of the null value we used the Mean, Median, and Mode techniques, and for the outliers, we used the Clipping method to handle the outliers without any loss to the data.
Feature Selection: At this step, we did the encoding of categorical features. We used the correlation coefficient, encoding, feature manipulation, and feature selection techniques to select the most relevant features. SMOTE is used to address the class imbalance in the target variable.
Feature Scaling: We scaled the features to bring down all of the values to a similar range.
Model Selection and Implementation: We pass the features to SVM, KNN, Decision Tree, Gradient Boosting, Logistic Regression, AdaBoosting, Naive Bayes & XGBoost classification algorithms. We also did hyperparameter tuning using GridSearchCV.
Performance Evaluation: After passing it to various classification models and calculating the metrics viz. accuracy, precision, recall, f1, roc-auc, we choose a final model that can make best predictions.

Artifacts

Dataset Source

MongoDB

Preprocessings steps

Handling Outliers
Scaling data
Handling imbalance dataset

Algorithms used to find best model

LogisticRegression
SVC
RandomForestClassifier
GradientBoostingClassifier
KNeighborsClassifier
DecisionTreeClassifier
XGBoost

Final Result

The XGBClassifier model emerged as the most effective have below metric scores:

Accuracy: 83%
Recall: 80%
ROC-AUC: 91%
Precision: 84%
F1: 81%

Deployed URLs

STREAMLIT: https://credit-card-default-detection.streamlit.app/
HUGGINGFACE: https://huggingface.co/spaces/abhijitpaul/Credit-Card-Default-Detection
AWS: https://5hniewmhgh.us-east-1.awsapprunner.com/

Project Artifacts

High Level Design (HLD): HLD_CreditCardDefaultDetection.pdf
Low Level Design (LLD): LLD_CreditCardDefaultDetection.pdf
Architecture Design: Architechture_CreditCardDefaultDetection.pdf
Wireframe Document: Wireframe_CreditCardDefaultDetection.pdf
Detailed Project Report (DPR): DPR_CreditCardDefaultDetection.pdf
Project Demo Video: https://youtu.be/8lv5jNUXcKQ?si=Ff1MIxzQMp9MBvj1

Snaptshot of HuggingFace Dashboard

MLOps

MLFlow - To track experiments, model verioning and reprodicibility
DagsHub - Integrated & hosted tool for MLOps Click here

Contributors

Abhijit Paul
Gouthami K

License

This project is licensed under the MIT License.

Uploading source dataset to MongoDB

# making connection with mongo db
from pymongo.mongo_client import MongoClient

# Create a new client and connect to the server
uri = <DB URI>
client = MongoClient(uri)

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)
    
# Creating DB instances of database which is already created from MongoDB Atlas
db=client["credit_card_defaults"]
collection= db['data']

# inserting the records into mongo db
records = df.to_dict(orient='records')
collection.insert_many(records)

# Retrieve data from the collection
data = list(collection.find())

# Load data into a Pandas DataFrame
df = pd.DataFrame(data)
df.sample(3)

Setting up the dev environment

# Create conda environment 'venv'
conda create -p ./venv python=3.9 -y

# Activate the environment
conda activate .\venv

# Upgrade pip and install required packages
python -m pip install --upgrade pip
pip install -r requirements.txt

# Install project  as package
python setup.py install

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.dvc		.dvc
.github/workflows		.github/workflows
notebooks		notebooks
src/CreditCardDefaultsPrediction		src/CreditCardDefaultsPrediction
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
init_setup.sh		init_setup.sh
requirements.txt		requirements.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Default Detection

Objective

Dataset

Machine Learning Pipeline

Artifacts

Dataset Source

Preprocessings steps

Algorithms used to find best model

Final Result

Deployed URLs

Project Artifacts

Snaptshot of HuggingFace Dashboard

MLOps

Contributors

License

Uploading source dataset to MongoDB

Setting up the dev environment

About

Releases

Packages

Languages

abhijitpaul0212/Credit-Card-Default-Detection

Folders and files

Latest commit

History

Repository files navigation

Credit Card Default Detection

Objective

Dataset

Machine Learning Pipeline

Artifacts

Dataset Source

Preprocessings steps

Algorithms used to find best model

Final Result

Deployed URLs

Project Artifacts

Snaptshot of HuggingFace Dashboard

MLOps

Contributors

License

Uploading source dataset to MongoDB

Setting up the dev environment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages