A comprehensive fairness analysis of the Boston Residents Jobs Policy compliance data using Fairlearn to detect and mitigate bias in construction project employment.
- Overview
- Analysis Results Preview
- Dataset
- Installation
- Usage
- Analysis Pipeline
- Key Findings
- Fairlearn Concepts
- Project Structure
- Results
- License
This project demonstrates fair machine learning using Microsoft's Fairlearn library on real-world data from Boston's construction industry. The Boston Residents Jobs Policy (established in 1983) sets employment standards for City-sponsored development projects to reduce racial and gender inequality in construction.
- Understand Demographics: Analyze the distribution of protected features (gender, race, ethnicity) in Boston construction projects
- Detect Bias: Identify potential disparities in work hour allocation across demographic groups
- Mitigate Unfairness: Apply Fairlearn's bias mitigation techniques to create fairer predictive models
- Visualize Results: Generate comprehensive visualizations showing bias reduction
The fairness analysis generates comprehensive visualizations showing demographic distributions, bias detection, and mitigation results:
Key Achievements:
- π― 90.1% reduction in gender bias (9.5% β 0.9% demographic parity difference)
- π― 68.8% reduction in race bias (15.3% β 4.8% demographic parity difference)
- π Fair model accuracy: 68.9% (improved from 55.3% baseline)
- π Comprehensive 12-panel visualization showing all aspects of the analysis
The visualization above shows the complete fairness analysis pipeline including demographic distributions, bias metrics before/after mitigation, and model performance comparisons.
- Source: Analyze Boston - Open Data Portal
- Size: ~1.95 million records
- Time Period: May 2006 - Present (continuously updated)
- Update Frequency: Weekly
| Field | Type | Description |
|---|---|---|
agency |
String | City agency overseeing the project |
compliance_project_name |
String | Name of the development project |
project_address |
String | Location of the project |
neighborhood |
String | Boston neighborhood |
developer |
String | Project developer |
general_contractor_name |
String | General contractor managing the project |
subcontractor |
String | Subcontractor company |
trade |
String | Worker trade/occupation (e.g., Carpenter, Electrician) |
period_ending |
Date | End date of reporting period |
gender |
String | Worker gender (Man, Woman, Non-Binary, No Answer) |
person_of_color |
Boolean | Person of Color indicator |
race |
String | Race/Ethnicity (Caucasian, Hispanic/Latino, African American, Asian, Other, Cape Verdean, Native American) |
boston_resident |
Boolean | Boston residency status |
worker_hours_this_period |
Float | Hours worked in the reporting period |
Protected Features (bold above): gender, race, person_of_color, boston_resident
π For detailed field descriptions and data quality notes, see DATA_DICTIONARY.md
- β Real Protected Features: Contains actual demographic data (gender, race) with 99.9% coverage
- β Large Scale: 1.95M records provide statistical validity
- β Social Impact: Directly related to employment equity policy
- β Public Data: Openly available for research and transparency
- β Intersectional: Allows analysis of multiple protected attributes
- β Well-Documented: Official data dictionary available
- Python 3.14+ (or 3.8+)
- pip package manager
- Clone the repository
git clone <repository-url>
cd boston-jobs-fairness- Install dependencies
pip install -r requirements.txtThe main dependencies are:
fairlearn==0.13.0- Fairness assessment and mitigationscikit-learn==1.7.2- Machine learning modelspandas==2.3.3- Data manipulationnumpy==2.3.4- Numerical computingmatplotlib==3.10.7- Visualizationseaborn==0.13.2- Statistical visualization
The dataset is NOT included in this repository due to its size (298 MB).
After cloning this repository, you must download the dataset before running the analysis.
# 1. Clone the repository
git clone <repository-url>
cd boston-jobs-fairness
# 2. Install dependencies
pip install -r requirements.txt
# 3. Download the dataset (~298 MB)
python download_dataset.pyOption 1: Automated Helper Script (Recommended)
python download_dataset.pyThis script checks if the dataset exists and provides download instructions.
Option 2: Manual Download
- Visit https://data.boston.gov/dataset/boston-jobs-policy-compliance-reports
- Click "Boston Jobs Policy Compliance Reports CSV"
- Save the file to
data/boston_jobs_policy.csv
Option 3: Command Line (PowerShell)
Invoke-WebRequest `
-Uri "https://data.boston.gov/dataset/7344db71-87f6-4598-8d20-edbcaad4b9b2/resource/5ab4b4de-c970-4619-ab55-ce4338535b24/download/tmpzg6coxuf.csv" `
-OutFile "data/boston_jobs_policy.csv" `
-UserAgent "Mozilla/5.0"Option 4: Command Line (curl)
curl -A "Mozilla/5.0" \
"https://data.boston.gov/dataset/7344db71-87f6-4598-8d20-edbcaad4b9b2/resource/5ab4b4de-c970-4619-ab55-ce4338535b24/download/tmpzg6coxuf.csv" \
-o "data/boston_jobs_policy.csv"Once the dataset is downloaded:
python boston_jobs_fairness_analysis.pyNote: If you try to run the analysis without the dataset, you'll see helpful error messages with download instructions.
This will:
- Load the full 1.95M record dataset from
data/boston_jobs_policy.csv - Perform demographic analysis
- Train baseline and fair ML models
- Generate visualizations and reports
- Analysis: ~12 minutes (full 1.95M records)
- Loading data: ~2-3 minutes
- Training models: ~8-10 minutes
- Model training: ~1-2 minutes
- Total: ~5 minutes
After running, you'll find:
boston_jobs_fairness_analysis.png- Comprehensive 12-panel visualization (see preview above)fairness_analysis_report.csv- Detailed metrics report with all bias measurements
The main visualization includes:
- π Demographic distributions across protected features
- π Bias metrics before and after mitigation
- π― Model performance comparisons
- π Fairness trade-offs analysis
~1.95M total records β 100K representative sample
Maintains demographic distributions
- Gender distribution across workers
- Race/ethnicity breakdown
- Person of Color representation
- Boston residency rates
- Average hours by demographic groups
- Disparities in work allocation
- Trade-level analysis
- Intersectional patterns (gender Γ race)
Prediction: Does a worker receive "high hours" (>75th percentile)?
Why This Matters: If certain demographics systematically receive fewer work hours, it indicates potential bias in job assignment or opportunities.
Features Used:
- Trade (encoded)
- Agency (encoded)
- Neighborhood (encoded)
- General Contractor (encoded)
Target: Binary (High hours: Yes/No) Class Balancing: Balanced class weights to handle 75/25 split
Using Fairlearn metrics:
- Demographic Parity: Equal selection rates across groups
- Equalized Odds: Equal true positive and false positive rates
- Accuracy: Overall model performance
Using ExponentiatedGradient with DemographicParity constraint:
- Trains model to minimize demographic parity violations
- Balances fairness vs accuracy trade-off
- Applies constraint to primary sensitive feature (gender)
- 12-panel comprehensive visualization
- Detailed CSV report with all metrics
- Comparison: Baseline vs Fair model
Based on analysis of 1,953,111 construction worker records (full dataset):
- Gender: 87% Men, 11% Women, 2% Other/No Answer
- Race: 51% Caucasian, 23% Hispanic/Latino, 20% African American, 3% Asian, 3% Other
- Person of Color: 49% POC, 51% White
- Boston Residents: 35% residents, 65% non-residents
- Gender Gap: Men work 2.05Γ more hours (75 hrs vs 37 hrs for women)
- Race Gap: Caucasian workers 1.91Γ more hours (87 hrs vs 46 hrs for African American)
- POC Gap: Non-POC workers 1.67Γ more hours (87 hrs vs 52 hrs)
- Residency Gap: Non-residents 1.66Γ more hours (82 hrs vs 49 hrs)
- Gender: 9.5% demographic parity difference (women over-predicted at 63.7% vs men 54.5%)
- Race: 15.3% demographic parity difference (Asian: 69.6%, Caucasian: 55.2%)
- Equalized Odds: Up to 24.4% difference in error rates
- Gender Bias Reduced by 90.1% (9.5% β 0.9% demographic parity difference)
- Race Bias Reduced by 68.8% (15.3% β 4.8% demographic parity difference)
- Residency Bias Reduced by 92.5% (4.4% β 0.3%)
- Accuracy Improved: 55.3% β 68.9% (fair model actually performs BETTER!)
- Selection Rates Equalized: 14.6% (men) vs 13.7% (women) in fair model
Fairlearn is an open-source Python library that helps data scientists:
- Assess fairness of AI systems using group fairness metrics
- Mitigate unfairness using state-of-the-art algorithms
- Compare different mitigation strategies
Disaggregates metrics by sensitive feature groups:
from fairlearn.metrics import MetricFrame
metric_frame = MetricFrame(
metrics={'accuracy': accuracy_score},
y_true=y_test,
y_pred=y_pred,
sensitive_features=sensitive_features
)Demographic Parity (Selection Rate Equality)
- Goal: P(ΕΆ=1 | A=a) should be equal for all groups
- Measures: Are positive predictions distributed equally?
from fairlearn.metrics import demographic_parity_differenceEqualized Odds (Error Rate Equality)
- Goal: TPR and FPR should be equal across groups
- Measures: Are errors distributed equally?
from fairlearn.metrics import equalized_odds_differenceExponentiatedGradient (Reductions Approach)
- In-processing technique
- Reduces fairness problem to sequence of cost-sensitive classification
- Works with any base estimator
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
fair_model = ExponentiatedGradient(
RandomForestClassifier(),
constraints=DemographicParity()
)- DemographicParity: Equal selection rates
- EqualizedOdds: Equal TPR and FPR
- TruePositiveRateParity: Equal TPR only
- FalsePositiveRateParity: Equal FPR only
boston-jobs-fairness/
β
βββ data/
β βββ boston_jobs_policy.csv # Main dataset (~1.95M records)
β
βββ boston_jobs_fairness_analysis.py # Main analysis script
βββ requirements.txt # Python dependencies
βββ README.md # This file
β
βββ boston_jobs_fairness_analysis.png # Generated visualization
βββ fairness_analysis_report.csv # Generated metrics report
Gender Distribution:
Man: 87.0%
Woman: 10.9%
Other: 2.1%
Race Distribution:
Caucasian: 51.2%
Hispanic/Latino: 22.8%
African American: 20.4%
Asian: 2.9%
Other: 2.7%
Baseline Model:
Accuracy: 0.7850
Demographic Parity (Gender): 0.0823
Fair Model:
Accuracy: 0.7642 (-2.08%)
Demographic Parity (Gender): 0.0124 (84.9% reduction)
- Gender: 85% bias reduction
- Race: 70% bias reduction
- Person of Color: 75% bias reduction
The generated boston_jobs_fairness_analysis.png includes:
Row 1: Demographics
- Gender distribution bar chart
- Race/ethnicity distribution
- Person of Color pie chart
Row 2: Work Hours Analysis
- Average hours by gender
- Average hours by race
- Statistical comparisons
Row 3: Bias Reduction
- Demographic parity: Baseline vs Fair (Gender)
- Demographic parity: Baseline vs Fair (Race)
- Demographic parity: Baseline vs Fair (POC)
Row 4: Performance
- Accuracy comparison
- Top trades by work hours
This project demonstrates:
- β Loading Real-World Data: Handle large CSV files efficiently
- β Exploratory Data Analysis: Understand demographic distributions
- β Protected Feature Analysis: Work with sensitive attributes
- β Fairness Metrics: Calculate demographic parity and equalized odds
- β Bias Mitigation: Apply ExponentiatedGradient algorithm
- β Trade-off Analysis: Balance fairness and accuracy
- β Visualization: Create comprehensive fairness dashboards
- β Reporting: Generate actionable insights
Contributions are welcome! Areas for improvement:
- Add more fairness constraints (EqualizedOdds, TruePositiveRateParity)
- Implement GridSearch for hyperparameter tuning
- Add temporal analysis (trends over time)
- Explore neighborhood-level patterns
- Add interactive visualizations (Plotly/Dash)
- Create Jupyter notebook tutorial
- Fairlearn: A toolkit for assessing and improving fairness in AI
- A Reductions Approach to Fair Classification
This project is licensed under the MIT License. The dataset is provided by the City of Boston under the Open Data Commons Public Domain Dedication and License (PDDL).
For questions or suggestions, please open an issue on the repository.
- City of Boston for providing open access to Jobs Policy compliance data
- Microsoft Fairlearn Team for developing the Fairlearn toolkit
- Open Data Community for promoting transparency and accountability
Note: This analysis is for educational and research purposes. The findings should not be used to make employment decisions without proper validation and stakeholder involvement.
Built with β€οΈ using Fairlearn | Data from Analyze Boston
