Risk Scoring for a Neobank Company

Introduction

The client for this project is a neobank specializing in offering competitively priced loans. However, the company is concerned about the quality of borrowers accessing their products. They require a robust system to assist in making informed loan approval decisions based on applicants’ profiles.

The goal is to implement a risk-scoring model using artificial intelligence algorithms to identify ‘risky’ applicants and estimate their associated expected losses. This information will be used to manage the bank’s economic capital, portfolio, and risk assessment effectively.

See a technical explanation of the project here

Objectives

The main objective is to develop a risk-scoring model using machine learning algorithms to predict potentially risky borrowers. This model will estimate the expected financial loss for each new customer-loan pairing, based on the company’s historical data. By leveraging this advanced analytical tool, the company’s performance will be significantly enhanced.

Project results

Recommended actions from EDA

Several insights have been uncovered through the exploratory data analysis. The main actionable initiatives are summarized below.

Credit scores appear to be effective in identifying high-quality borrowers. These profiles should be targeted for promotion, and a broader range of products, such as investment opportunities, stocks, and index funds, could be offered to them.
The job title category needs improvement to provide more accurate information, which will be beneficial for the development of the machine learning algorithms.
Since three main borrower profiles have been identified based on credit card usage, targeted campaigns can be developed for each group. Customized products or loans tailored to their specific needs could be offered to them.
According to the company’s historical data, 30-month loans are performing better. These should be promoted, and additional products in this category could be considered.

Risk scoring model

In this project, we have developed a risk-scoring model to predict the Expected Loss (EL) associated with a new loan application. To achieve that, three key risk parameters are considered:

Probability of Default (PD): This measures the likelihood that a borrower will default, based on an internally assigned credit rating.
Exposure at Default (EAD): This indicates the amount of outstanding debt at the time of default.
Loss Given Default (LGD): This metric represents the percentage of the loan exposure that is not expected to be recovered if a default occurs.

To estimate these risk parameters, three predictive machine learning models are developed. For the PD model, a logistic regression algorithm is used since high interpretability and auditability are required at this stage in the financial sector. On the other hand, for estimating the EAD and LGD models LightGBM algorithms are finally selected due to their superior performance. The predictions from these models are then combined to calculate the EL for each loan transaction. To calculate this value, the following formula is applied:

$$ EL[\textdollar] = PD \cdot P[\textdollar] \cdot EAD \cdot LDG, $$

where P is the loan principal, i.e., the amount of money the borrower whises to apply for.

Risk scoring analyzer web app

To maximize the value of the developed machine learning models, it is essential to seamlessly deploy them into production so that employees can start utilizing them to make informed, practical decisions.

To achieve this, a prototype web application has been designed. This web app gathers internal data from the company for each client, as well as information provided by the borrower through their loan application.

Launch Risk Scoring Analyzer Web App!

Project structure

📁 .streamlit
- config.toml: File containing some configuration parameters for the Risk Scoring Analyzer web app.
📁 00_Imagenes: Contains project images.
📁 01_Documentos: Contains basic project files:
- Diccionario.xlsx: Feature-level metadata.
- riesgos.yml: Project environment file.
- FaseDesarrollo_Transformaciones.xlsx: Support file for designing feature transformation processes.
- FaseProduccion_Procesos.xlsx: Support file for designing final production script.
- stop_words_english.txt: Dictionary for the non-relevant words used in the text data analysis (TF-IDF analysis).
📁 02_Datos
- 📁 01_Originales
  - prestamos.csv: Original dataset.
- 📁 02_Validacion
  - validacion.csv: Sample extracted from the original dataset at the beginning of the project, which is used to check the correct performance of the model once it is put into production.
- 📁 03_Trabajo
  - This folder contains the datasets resulting from each of the stages of the project (data quality, exploratory data analysis, variable transformation, ...).
📁 03_Notebooks
- 📁 02_Desarrollo
  - 01_Set Up.ipynb: Notebook used for the initial set up of the project.
  - 02_Calidad de Datos.ipynb: Notebook detailing and executing all data quality processes.
  - 03_EDA.ipynb: Notebook used for the execution of the exploratory data analysis.
  - 04_Transformacion de datos.ipynb: Notebook that details and executes the data transformation processes necessary to prepare the variables for the models.
  - 05_Modelizacion Clasificacion PD.ipynb: Notebook used for modeling the predictive Probability of Default model. It contains the model selection, the hyperparametrization, and the evaluation of results.
  - 06_Modelizacion para Regresion EAD.ipynb: Notebook for modeling the predictive Exposure at Default model. It contains the model selection, the hyperparametrization, and the evaluation of results.
  - 07_Modelizacion para Regresion LGD.ipynb: Notebook for modeling the predictive Loss Given default model. It contains the model selection, the hyperparametrization, and the evaluation of results.
  - 08_Preparacion del codigo de produccion.ipynb: Notebook used to compile all the quality, transformation, and variable selection processes, as well as the final model and execution and retraining processes. It is used to create the final retraining and execution pipes that condense all the aforementioned processes.
- 📁 03_Sistema
  - This folder contains the files (production script, models, functions ...) used in the model's deployment.
  - 📁 app_riesgos
    - This folder contains the app files necessary for the deployment of the web application Risk Scoring Analyzer.
📁 04_Modelos
- pipe_ejecucion_pd.pickle: Pipe that condenses the final PD trained model as well as all necessary prior data transformations.
- pipe_ejecucion_ead.pickle: Pipe that condenses the final EAD trained model as well as all necessary prior data transformations.
- pipe_ejecucion_lgd.pickle: Pipe that condenses the final LGD trained model as well as all necessary prior data transformations.
- pipe_entrenamiento_pd.pickle: Pipe that condenses the entire PD model training process. It can be used to retrain the model with new data when necessary.
- pipe_entrenamiento_ead.pickle: Pipe that condenses the entire EAD model training process. It can be used to retrain the model with new data when necessary.
- pipe_entrenamiento_lgd.pickle: Pipe that condenses the entire LGD model training process. It can be used to retrain the model with new data when necessary.
📁 05_Resultados
- Codigo de ejecucion.py: Python script to execute the model and obtain the results.
- Codigo de reentrenamiento.py: Python script to retrain the model with new data when necessary.
- Risk scoring analyzer web app link.md: File containing the link for the Risk Scoring Analyzer web app.

Instructions

The project should be run using the same environment in which it was created.

Project environment can be replicated using the riesgos.yml file, which was created during the set up phase of the project. It can be found in the folder 01_Documentos.
To replicate the environment it is necessary to copy the riesgos.yml file to the directory and use the terminal or anaconda prompt executing:
- conda env create --file riesgos.yml --name project_name

On the other hand, remember to update the project_path variable of the notebooks to the path where you have replicated the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Risk Scoring for a Neobank Company

Table of Contents

Introduction

Objectives

Project results

Recommended actions from EDA

Risk scoring model

Risk scoring analyzer web app

Project structure

Instructions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.streamlit		.streamlit
00_Imagenes		00_Imagenes
01_Documentos		01_Documentos
02_Datos		02_Datos
03_Notebooks		03_Notebooks
04_Modelos		04_Modelos
05_Resultados		05_Resultados
README.md		README.md

pabloelt/risk-scoring-for-a-neobank-company

Folders and files

Latest commit

History

Repository files navigation

Risk Scoring for a Neobank Company

Table of Contents

Introduction

Objectives

Project results

Recommended actions from EDA

Risk scoring model

Risk scoring analyzer web app

Project structure

Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages