A Python/Jupyter notebook that detects and mitigates age-based bias in bank loan credit quality predictions using the Fairlearn library.
The notebook trains a Decision Tree classifier to predict the credit quality (Credit_Mix) of bank loans — labelled as Bad or Good/Standard — and then investigates whether the model treats borrowers of different age groups fairly. It applies two state-of-the-art bias mitigation strategies and compares their results as shown in the following chart:
An Excel file (~4,300 records) containing individual loan applicant data with fields such as Age, Annual_Income, Outstanding_Debt, Num_of_Delayed_Payment, Interest_Rate, and Credit_Mix (the target variable).
The excel file is a data clean subset of the "Credit score classification" file of Kaggle (https://www.kaggle.com/datasets/parisrohan/credit-score-classification) which contains persons’ credit-related information.
- Validates required columns and coerces numeric types
- Builds a binary target: Bad (1) vs Good/Standard (0)
- Bins
Ageinto five groups: ≤25, 26–35, 36–45, 46–60, 60+
- Visualises age distribution, credit mix per age group, bad-loan rates, and sample representation — revealing that older borrowers are underrepresented and show near-zero bad-loan rates, introducing bias against younger age groups.
- Trains a Decision Tree and evaluates it using Fairlearn's
MetricFrame - Measures Demographic Parity (selection rate equality) and Equal Opportunity (true positive rate equality) across age groups
| Method | Type | Mechanism |
|---|---|---|
ExponentiatedGradient |
In-processing | Re-weights training to satisfy EqualizedOdds constraints |
ThresholdOptimizer |
Post-processing | Applies group-specific classification thresholds |
- Summary table of Accuracy, DPD, DPR, EOD, EOR across all models
- Bar charts, radar/spider chart, and heatmaps for visual comparison
fairlearn · scikit-learn · pandas · numpy · matplotlib · seaborn
jupyter notebook bias_analysis.ipynb