A reference implementation of enterprise IAM connection governance — 75 applications, 255 connections, 14 departments, 96 governance gaps detected using synthetic production-representative data.
In large organisations, information about which applications connect to which IAM products is scattered across multiple systems — application inventories, IAM registries, connection logs, and manual spreadsheets. This data is incomplete, inconsistently structured, and not maintained to any defined standard.
The result is a governance gap: nobody has a clear, current picture of which applications have the right IAM coverage, which connections are stale or undocumented, and where the highest risk exposures are.
This pipeline addresses that directly — ingesting from three scattered data sources, consolidating into a unified risk-classified governance map, detecting violations, and producing structured outputs for audit systems and management stakeholders.
3 Scattered Data Sources
│
▼
┌─────────────────────────────────────────┐
│ pipeline.py │
│ │
│ [1] Ingest from multiple repositories │
│ [2] Clean & standardise │
│ [3] Build unified connection map │
│ [4] Classify risk & detect gaps │
│ [5] Expose data & maintain │
└─────────────────────────────────────────┘
│
▼
6 Structured Outputs
iam-connection-governance-pipeline/
│
├── 📁 data/ # Three scattered source repositories
│ ├── application_inventory.csv # Source 1 — 75 enterprise applications
│ ├── iam_product_registry.json # Source 2 — 5 IAM products and requirements
│ └── connection_log.csv # Source 3 — 255 connection records
│
├── 📁 src/ # Pipeline modules
│ ├── risk_classifier.py # Risk classification & gap detection logic
│ ├── report_generator.py # HTML dashboard & CSV export
│ ├── scorecard.py # Department scorecard & remediation queue
│ └── __init__.py
│
├── 📁 output/ # Generated on every pipeline run
│ ├── iam_connection_master_map.csv # Unified governance map for audit systems
│ ├── governance_gaps.csv # Gap report for application owners
│ ├── department_risk_scorecard.csv # Department-level risk ratings
│ ├── remediation_priority_queue.csv # Prioritised actions with deadlines
│ ├── iam_governance_report.html # Management dashboard
│ └── pipeline_run_log.txt # Audit trail
│
├── feeding_rules.yaml # Authoritative sources, conflict resolution & SLAs
├── pipeline.py # Main entry point
├── requirements.txt
└── README.md
| Source | Format | Records | Description |
|---|---|---|---|
application_inventory.csv |
CSV | 75 apps | Application metadata — owner, department, sensitivity, environment, region |
iam_product_registry.json |
JSON | 5 products | Azure AD, PAM, RBAC, MFA, IGA — coverage requirements per sensitivity level |
connection_log.csv |
CSV | 255 records | Current connections with realistic gaps — stale reviews, missing entries, unknown reviewers |
Applications span: Finance · HR · Operations · Logistics · Compliance · Legal · IT · Air Cargo · Sustainability · Executive · Strategy
Regions covered: Global · EMEA · APAC · Americas
Loads all three source files. Simulates the real-world problem: three separate repositories with inconsistencies, missing fields, and varying formats.
- Fills missing application owners
- Standardises sensitivity and status fields
- Flags connections not reviewed within 180 days as stale
- Handles inconsistent date formats across sources
Merges all three sources into a single structured data model.
Feeding rules (defined in feeding_rules.yaml):
| Field | Authoritative Source | Update Mode |
|---|---|---|
| App metadata | application_inventory | Manual |
| IAM product details | iam_product_registry | Scheduled (180 days) |
| Connection status | connection_log | Manual (post-review) |
| Risk range | Pipeline (calculated) | Automatic |
Risk Classification:
| Risk | Criteria |
|---|---|
| 🔴 HIGH | HIGH sensitivity + business critical + Production |
| 🟠 MEDIUM | MEDIUM sensitivity OR stale review OR pending/undocumented |
| 🟢 LOW | LOW sensitivity, Development, non-critical |
Gap Types Detected:
| Gap Type | Description |
|---|---|
MISSING_CONNECTION |
Required IAM product not connected for app's sensitivity level |
STALE_REVIEW |
Connection not reviewed within defined cycle |
UNDOCUMENTED_CONNECTION |
Connection exists with no documented status |
UNKNOWN_OWNER |
Application has no registered owner |
MISSING_REVIEW_DATE |
Active connection with no review date recorded |
Generates all outputs and logs the run for audit traceability.
| Output | Format | Audience | Channel |
|---|---|---|---|
iam_connection_master_map.csv |
CSV | GRC / Audit systems | Automated integration |
governance_gaps.csv |
CSV | Application owners | Weekly email distribution |
department_risk_scorecard.csv |
CSV | IAM Manager | Internal reporting |
remediation_priority_queue.csv |
CSV | Application owners + IAM team | ServiceNow tickets |
iam_governance_report.html |
HTML | Control Tower / Management | Intranet SharePoint |
pipeline_run_log.txt |
TXT | Audit trail | Retained in output directory |
| Gap Type | HIGH Risk | MEDIUM Risk | LOW Risk |
|---|---|---|---|
| Missing Connection | 7 days 🔴 | 30 days 🟠 | 90 days 🟢 |
| Stale Review | 3 days 🔴 | 14 days 🟠 | 30 days 🟢 |
| Undocumented Connection | 24 hours 🔴 | 7 days 🟠 | 30 days 🟢 |
| Unknown Owner | 48 hours 🔴 | 7 days 🟠 | 30 days 🟢 |
| Missing Review Date | 3 days 🔴 | 14 days 🟠 | 30 days 🟢 |
The feeding_rules.yaml file defines the governance rules for the entire pipeline:
- Authoritative source per field — which system owns the data
- Conflict resolution — what happens when sources disagree (reject, flag, escalate)
- Update modes — automatic, manual, or scheduled
- Remediation SLAs — deadlines per gap type and risk range
- Communication channels — how each output reaches its audience
- Refreshes connection map from all source systems
- Flags stale reviews, missing connections, undocumented entries
- Rebuilds department scorecard and remediation queue
- Regenerates all output files
- Appends timestamped audit log entry
- Application owners update
connection_log.csvafter each access review - Data Protection Officer confirms sensitivity classification quarterly
- IAM team updates
iam_product_registry.jsonwhen products change - Unknown owners resolved within 7 days of gap detection
# Install dependencies
pip install -r requirements.txt
# Run the pipeline
python pipeline.pyOpen output/iam_governance_report.html in a browser to view the governance dashboard.
============================================================
IAM CONNECTION GOVERNANCE PIPELINE
Run date: 2026-03-09
============================================================
[1/5] Loading data sources...
✓ Application inventory: 75 records loaded
✓ IAM product registry: 5 products loaded
✓ Connection log: 255 connection records loaded
[2/5] Cleaning and standardising data...
✓ Missing owners filled: 4 apps
✓ Undocumented connections flagged: 10
✓ Stale connections flagged: 44
[3/5] Building unified connection map...
✓ Unified map built: 255 connection records across 75 applications
[4/5] Classifying risk and detecting gaps...
✓ HIGH: 180 MEDIUM: 69 LOW: 6
✓ Governance gaps detected: 96
[5/5] Generating outputs...
✓ Master map, gap report, scorecard, remediation queue, HTML dashboard
✓ 69 CRITICAL items requiring immediate action
============================================================
PIPELINE COMPLETE — 6 outputs written to ./output/
============================================================
- Python — pipeline orchestration and data processing
- pandas — ingestion, cleaning, merging, classification
- PyYAML — feeding rules and remediation config engine
- HTML/CSS — stakeholder dashboard (no dependencies, opens in any browser)
Pavan Kumar Naganaboina MSc Data Management & AI — ECE Paris linkedin.com/in/pavankumarn01 · github.com/pavansri8886
Built to demonstrate enterprise IAM governance methodology — data consolidation, risk classification, gap detection, feeding rules design, and stakeholder reporting at scale.