Credit Risk Modelling — Mortgage Default Prediction

Overview

Lenders need reliable estimates of borrower default risk to price loans, set limits, and manage portfolio exposure. This project builds supervised learning models to predict whether a mortgage will default, using borrower, loan, and performance attributes. We compare model families, handle class imbalance, and emphasize metrics aligned with risk decisions (e.g., precision at operating thresholds, recall on defaulters, and expected loss impact).

Objective PD (Probability of Default)

Problem Type Binary Classification

Focus Recall & Calibration

Data

Features (examples)

Borrower: FICO, DTI, income, employment length
Loan: LTV, interest rate, term, purpose, property type
Behavioral: payment history, delinquency counts, utilization
Derived: buckets (FICO bands), interaction terms, winsorized ratios

Target & Imbalance

Target: default indicator (1/0)
Imbalance: defaults << non-defaults → use stratified splits & threshold tuning
Leakage Controls: exclude post-origination signals from training features

Replace or augment the above with the exact fields your notebook uses. The page stays static; the notebook hosts the full details.

Methodology

Preprocessing: missing value strategy (median/most-frequent), outlier handling, categorical encoding.
Feature Engineering: ratio features (DTI, LTV), binning (FICO bands), trend deltas if available.
Train/Validation/Test: stratified split; cross-validation for robust hyperparameters.
Class Imbalance: class_weight or focal thresholds; compare with simple resampling.
Evaluation: ROC-AUC/PR-AUC + business metrics (Precision@k, Recall of defaulters, KS, Brier score).
Calibration: Platt/Isotonic when probabilities are used in expected loss.
Explainability: permutation importance / SHAP (global & local) for risk review.

Models Compared

Baseline

Logistic Regression (with regularization)
Decision Tree (depth-limited)

Tree-Based & Ensembles

Random Forest
Gradient Boosting / XGBoost / LightGBM
Calibrated probabilities for PD

Use the notebook for full training code, CV strategy, and hyperparameters.

Results (Illustrative)

ROC-AUC 0.86

PR-AUC (pos) 0.42

KS 0.46

Operating Point: threshold chosen to balance recall of defaulters with false-positive rate.
Business View: compare expected loss reduction vs. approval rate impact at candidate thresholds.
Explainability: top drivers often include DTI, LTV, delinquency counts, and interest rate.

Replace the placeholder metrics with your notebook’s actual numbers (or remove this box and keep the narrative).

👉 See full training, plots, and tables in the notebook.

How to Reproduce

This site is static; to run the code locally:

# 1) Clone
git clone https://github.com/ishujaswani/Credit-Risk-Modelling-Using-Machine-learning
cd Credit-Risk-Modelling-Using-Machine-learning

# 2) (Optional) Create env & install deps
# python -m venv .venv && source .venv/bin/activate
# pip install -r requirements.txt

# 3) Open the notebook
jupyter notebook Hwk1.ipynb

# 4) Export HTML (if you re-render)
jupyter nbconvert --to html --template lab Hwk1.ipynb --output Hwk1.html

If you add more notebooks (EDA, feature store, scorecard), link them here and keep Hwk1.html as the main CTA above.