AI Harness Scorecard

Grade any Git repository on its engineering safeguards for safe AI-assisted development.

Not "is this repo ready for agents?" but rather: "will the code agents produce here be reliable?"

Based on research from DORA 2025, OpenAI's Harness Engineering, SlopCodeBench, and Kent Beck's testing principles. See the full analysis: The Engineering Leader's Uncomfortable Truth About AI-Assisted Development.

Quick Start

# Install
pip install ai-harness-scorecard

# Assess current repo
ai-harness-scorecard assess .

# Assess another repo
ai-harness-scorecard assess /path/to/repo

# Markdown report
ai-harness-scorecard assess . --format markdown -o report.md

# JSON for CI integration
ai-harness-scorecard assess . --format json

What It Checks

Five categories, 31 checks, each grounded in published research:

1. Architectural Documentation (20%)

Architecture docs, agent instructions, ADRs, module boundary constraints, API documentation.

2. Mechanical Constraints (25%)

CI pipeline, linter/formatter enforcement, type safety, dependency auditing, conventional commits, unsafe code policies.

3. Testing & Stability (25%)

Test suite in CI, feature matrix testing, code coverage, mutation testing, property-based testing, fuzz testing, contract tests, blocking test jobs.

4. Review & Drift Prevention (15%)

Code review enforcement, scheduled CI, stale doc detection, PR/MR templates, automated review bots, doc sync checks.

5. AI-Specific Safeguards (15%)

AI usage norms, small batch enforcement, design-before-code culture, error handling policies, security-critical path marking.

Grading

Grade	Score	Meaning
A	85-100	Strong harness. AI-generated code has robust mechanical safeguards
B	70-84	Good foundation. Some gaps in enforcement or feedback loops
C	55-69	Basic practices present but insufficient for safe AI scaling
D	40-54	Significant gaps. AI code likely accumulating undetected debt
F	0-39	No meaningful harness. AI output is essentially unaudited

Example Output

Grade: B  (74.2/100)
Good foundation. Some gaps in enforcement or feedback loops.

         Category Scores
┌──────────────────────────┬────────┬───────┬────────┐
│ Category                 │ Weight │ Score │ Checks │
├──────────────────────────┼────────┼───────┼────────┤
│ Architectural Docs       │    20% │   60% │    3/5 │
│ Mechanical Constraints   │    25% │   91% │    6/7 │
│ Testing & Stability      │    25% │   72% │    5/8 │
│ Review & Drift           │    15% │   60% │    3/6 │
│ AI-Specific Safeguards   │    15% │   67% │    3/5 │
└──────────────────────────┴────────┴───────┴────────┘

Badge

Display your score as a badge in your README, powered by shields.io:

Setup:

Add the GitHub Action to your repo (see below) with badge generation enabled (on by default)
The action creates a scorecard-badge.json in your repo -- commit it
Add the badge to your README:

[![AI Harness Scorecard](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FOWNER%2FREPO%2Fmain%2Fscorecard-badge.json)](scorecard-report.md)

Replace OWNER and REPO with your GitHub username and repository name. The badge links to scorecard-report.md in the same repo, which contains the full score breakdown and recommendations.

CLI:

ai-harness-scorecard assess . --badge scorecard-badge.json

Use as GitHub Action

Add the scorecard to any repository's CI with a one-liner:

name: AI Harness Scorecard
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # weekly

jobs:
  scorecard:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4
      - uses: markmishaev76/ai-harness-scorecard@v1
        id: scorecard
      - name: Commit badge and report
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add scorecard-badge.json scorecard-report.md
          git diff --cached --quiet || git commit -m "chore: update scorecard badge and report"
          git push

The action generates scorecard-badge.json by default. The workflow above commits it so shields.io can read it from raw.githubusercontent.com.

Inputs:

Input	Default	Description
`path`	`.`	Path to the repository to assess
`format`	`markdown`	Output format: `markdown`, `json`, or `terminal`
`output-file`	`scorecard-report.md`	File path for the report
`badge-file`	`scorecard-badge.json`	File path for the shields.io badge JSON (set empty to skip)

Outputs:

Output	Description
`grade`	Letter grade (A/B/C/D/F)
`score`	Numeric score (0-100)
`report-path`	Path to the generated report file
`badge-path`	Path to the generated badge JSON file

Platform Support

Works on any cloned Git repository (GitHub, GitLab, Bitbucket, self-hosted). Most checks are file-based and platform-independent.

For platform-specific checks (branch protection, required reviewers), future versions will support:

# GitHub
ai-harness-scorecard assess github:owner/repo

# GitLab
ai-harness-scorecard assess gitlab:group/project

Design Principles

Deterministic. No LLM dependency. Two runs on the same repo produce the same score.
Language-aware. Checks adapt to Rust, Python, TypeScript, Go, Java, etc.
Additive scoring. Each check contributes points. Missing an inapplicable check doesn't penalize.
Research-grounded. Every check maps back to a specific study or published best practice. See resources/ for the full knowledge base.

Documentation

Doc	Purpose
ARCHITECTURE.md	Module layout, dependency rules, data flow
CONTRIBUTING.md	Dev setup, how to add checks, PR guidelines
AGENTS.md	AI agent instructions and naming conventions
resources/	Research, blog posts, and references behind each check

Development

# Clone
git clone https://github.com/markmishaev76/ai-harness-scorecard
cd ai-harness-scorecard

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/ tests/

See CONTRIBUTING.md for the full development guide.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github		.github
docs		docs
resources		resources
src/ai_harness_scorecard		src/ai_harness_scorecard
tests		tests
.commitlintrc.yml		.commitlintrc.yml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
mutmut_config.py		mutmut_config.py
pyproject.toml		pyproject.toml
scorecard-badge.json		scorecard-badge.json
scorecard-report.md		scorecard-report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Harness Scorecard

Quick Start

What It Checks

1. Architectural Documentation (20%)

2. Mechanical Constraints (25%)

3. Testing & Stability (25%)

4. Review & Drift Prevention (15%)

5. AI-Specific Safeguards (15%)

Grading

Example Output

Badge

Use as GitHub Action

Platform Support

Design Principles

Documentation

Development

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Harness Scorecard

Quick Start

What It Checks

1. Architectural Documentation (20%)

2. Mechanical Constraints (25%)

3. Testing & Stability (25%)

4. Review & Drift Prevention (15%)

5. AI-Specific Safeguards (15%)

Grading

Example Output

Badge

Use as GitHub Action

Platform Support

Design Principles

Documentation

Development

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages