Grade any Git repository on its engineering safeguards for safe AI-assisted development.
Not "is this repo ready for agents?" but rather: "will the code agents produce here be reliable?"
Based on research from DORA 2025, OpenAI's Harness Engineering, SlopCodeBench, and Kent Beck's testing principles. See the full analysis: The Engineering Leader's Uncomfortable Truth About AI-Assisted Development.
# Install
pip install ai-harness-scorecard
# Assess current repo
ai-harness-scorecard assess .
# Assess another repo
ai-harness-scorecard assess /path/to/repo
# Markdown report
ai-harness-scorecard assess . --format markdown -o report.md
# JSON for CI integration
ai-harness-scorecard assess . --format jsonFive categories, 31 checks, each grounded in published research:
Architecture docs, agent instructions, ADRs, module boundary constraints, API documentation.
CI pipeline, linter/formatter enforcement, type safety, dependency auditing, conventional commits, unsafe code policies.
Test suite in CI, feature matrix testing, code coverage, mutation testing, property-based testing, fuzz testing, contract tests, blocking test jobs.
Code review enforcement, scheduled CI, stale doc detection, PR/MR templates, automated review bots, doc sync checks.
AI usage norms, small batch enforcement, design-before-code culture, error handling policies, security-critical path marking.
| Grade | Score | Meaning |
|---|---|---|
| A | 85-100 | Strong harness. AI-generated code has robust mechanical safeguards |
| B | 70-84 | Good foundation. Some gaps in enforcement or feedback loops |
| C | 55-69 | Basic practices present but insufficient for safe AI scaling |
| D | 40-54 | Significant gaps. AI code likely accumulating undetected debt |
| F | 0-39 | No meaningful harness. AI output is essentially unaudited |
Grade: B (74.2/100)
Good foundation. Some gaps in enforcement or feedback loops.
Category Scores
┌──────────────────────────┬────────┬───────┬────────┐
│ Category │ Weight │ Score │ Checks │
├──────────────────────────┼────────┼───────┼────────┤
│ Architectural Docs │ 20% │ 60% │ 3/5 │
│ Mechanical Constraints │ 25% │ 91% │ 6/7 │
│ Testing & Stability │ 25% │ 72% │ 5/8 │
│ Review & Drift │ 15% │ 60% │ 3/6 │
│ AI-Specific Safeguards │ 15% │ 67% │ 3/5 │
└──────────────────────────┴────────┴───────┴────────┘
Display your score as a badge in your README, powered by shields.io:
Setup:
- Add the GitHub Action to your repo (see below) with badge generation enabled (on by default)
- The action creates a
scorecard-badge.jsonin your repo -- commit it - Add the badge to your README:
[](scorecard-report.md)Replace OWNER and REPO with your GitHub username and repository name. The badge links to scorecard-report.md in the same repo, which contains the full score breakdown and recommendations.
CLI:
ai-harness-scorecard assess . --badge scorecard-badge.jsonAdd the scorecard to any repository's CI with a one-liner:
name: AI Harness Scorecard
on:
push:
branches: [main]
schedule:
- cron: '0 6 * * 1' # weekly
jobs:
scorecard:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- uses: markmishaev76/ai-harness-scorecard@v1
id: scorecard
- name: Commit badge and report
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add scorecard-badge.json scorecard-report.md
git diff --cached --quiet || git commit -m "chore: update scorecard badge and report"
git pushThe action generates scorecard-badge.json by default. The workflow above commits it so shields.io can read it from raw.githubusercontent.com.
Inputs:
| Input | Default | Description |
|---|---|---|
path |
. |
Path to the repository to assess |
format |
markdown |
Output format: markdown, json, or terminal |
output-file |
scorecard-report.md |
File path for the report |
badge-file |
scorecard-badge.json |
File path for the shields.io badge JSON (set empty to skip) |
Outputs:
| Output | Description |
|---|---|
grade |
Letter grade (A/B/C/D/F) |
score |
Numeric score (0-100) |
report-path |
Path to the generated report file |
badge-path |
Path to the generated badge JSON file |
Works on any cloned Git repository (GitHub, GitLab, Bitbucket, self-hosted). Most checks are file-based and platform-independent.
For platform-specific checks (branch protection, required reviewers), future versions will support:
# GitHub
ai-harness-scorecard assess github:owner/repo
# GitLab
ai-harness-scorecard assess gitlab:group/project- Deterministic. No LLM dependency. Two runs on the same repo produce the same score.
- Language-aware. Checks adapt to Rust, Python, TypeScript, Go, Java, etc.
- Additive scoring. Each check contributes points. Missing an inapplicable check doesn't penalize.
- Research-grounded. Every check maps back to a specific study or published best practice. See resources/ for the full knowledge base.
| Doc | Purpose |
|---|---|
| ARCHITECTURE.md | Module layout, dependency rules, data flow |
| CONTRIBUTING.md | Dev setup, how to add checks, PR guidelines |
| AGENTS.md | AI agent instructions and naming conventions |
| resources/ | Research, blog posts, and references behind each check |
# Clone
git clone https://github.com/markmishaev76/ai-harness-scorecard
cd ai-harness-scorecard
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check src/ tests/See CONTRIBUTING.md for the full development guide.
MIT