Overview
Lenders need reliable estimates of borrower default risk to price loans, set limits, and manage portfolio exposure. This project builds supervised learning models to predict whether a mortgage will default, using borrower, loan, and performance attributes. We compare model families, handle class imbalance, and emphasize metrics aligned with risk decisions (e.g., precision at operating thresholds, recall on defaulters, and expected loss impact).
Data
Features (examples)
- Borrower: FICO, DTI, income, employment length
- Loan: LTV, interest rate, term, purpose, property type
- Behavioral: payment history, delinquency counts, utilization
- Derived: buckets (FICO bands), interaction terms, winsorized ratios
Target & Imbalance
- Target: default indicator (1/0)
- Imbalance: defaults << non-defaults → use stratified splits & threshold tuning
- Leakage Controls: exclude post-origination signals from training features
Replace or augment the above with the exact fields your notebook uses. The page stays static; the notebook hosts the full details.
Methodology
- Preprocessing: missing value strategy (median/most-frequent), outlier handling, categorical encoding.
- Feature Engineering: ratio features (DTI, LTV), binning (FICO bands), trend deltas if available.
- Train/Validation/Test: stratified split; cross-validation for robust hyperparameters.
- Class Imbalance: class_weight or focal thresholds; compare with simple resampling.
- Evaluation: ROC-AUC/PR-AUC + business metrics (Precision@k, Recall of defaulters, KS, Brier score).
- Calibration: Platt/Isotonic when probabilities are used in expected loss.
- Explainability: permutation importance / SHAP (global & local) for risk review.
Models Compared
Baseline
- Logistic Regression (with regularization)
- Decision Tree (depth-limited)
Tree-Based & Ensembles
- Random Forest
- Gradient Boosting / XGBoost / LightGBM
- Calibrated probabilities for PD
Use the notebook for full training code, CV strategy, and hyperparameters.
Results (Illustrative)
- Operating Point: threshold chosen to balance recall of defaulters with false-positive rate.
- Business View: compare expected loss reduction vs. approval rate impact at candidate thresholds.
- Explainability: top drivers often include DTI, LTV, delinquency counts, and interest rate.
Replace the placeholder metrics with your notebook’s actual numbers (or remove this box and keep the narrative).
👉 See full training, plots, and tables in the notebook.
How to Reproduce
This site is static; to run the code locally:
# 1) Clone
git clone https://github.com/ishujaswani/Credit-Risk-Modelling-Using-Machine-learning
cd Credit-Risk-Modelling-Using-Machine-learning
# 2) (Optional) Create env & install deps
# python -m venv .venv && source .venv/bin/activate
# pip install -r requirements.txt
# 3) Open the notebook
jupyter notebook Hwk1.ipynb
# 4) Export HTML (if you re-render)
jupyter nbconvert --to html --template lab Hwk1.ipynb --output Hwk1.html
If you add more notebooks (EDA, feature store, scorecard), link them here and keep
Hwk1.html as the main CTA above.