Skip to content

markmishaev76/ai-harness-scorecard

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Repository files navigation

AI Harness Scorecard

AI Harness Scorecard

Grade any Git repository on its engineering safeguards for safe AI-assisted development.

Not "is this repo ready for agents?" but rather: "will the code agents produce here be reliable?"

Based on research from DORA 2025, OpenAI's Harness Engineering, SlopCodeBench, and Kent Beck's testing principles. See the full analysis: The Engineering Leader's Uncomfortable Truth About AI-Assisted Development.

Quick Start

# Install
pip install ai-harness-scorecard

# Assess current repo
ai-harness-scorecard assess .

# Assess another repo
ai-harness-scorecard assess /path/to/repo

# Markdown report
ai-harness-scorecard assess . --format markdown -o report.md

# JSON for CI integration
ai-harness-scorecard assess . --format json

What It Checks

Five categories, 31 checks, each grounded in published research:

1. Architectural Documentation (20%)

Architecture docs, agent instructions, ADRs, module boundary constraints, API documentation.

2. Mechanical Constraints (25%)

CI pipeline, linter/formatter enforcement, type safety, dependency auditing, conventional commits, unsafe code policies.

3. Testing & Stability (25%)

Test suite in CI, feature matrix testing, code coverage, mutation testing, property-based testing, fuzz testing, contract tests, blocking test jobs.

4. Review & Drift Prevention (15%)

Code review enforcement, scheduled CI, stale doc detection, PR/MR templates, automated review bots, doc sync checks.

5. AI-Specific Safeguards (15%)

AI usage norms, small batch enforcement, design-before-code culture, error handling policies, security-critical path marking.

Grading

Grade Score Meaning
A 85-100 Strong harness. AI-generated code has robust mechanical safeguards
B 70-84 Good foundation. Some gaps in enforcement or feedback loops
C 55-69 Basic practices present but insufficient for safe AI scaling
D 40-54 Significant gaps. AI code likely accumulating undetected debt
F 0-39 No meaningful harness. AI output is essentially unaudited

Example Output

Grade: B  (74.2/100)
Good foundation. Some gaps in enforcement or feedback loops.

         Category Scores
┌──────────────────────────┬────────┬───────┬────────┐
│ Category                 │ Weight │ Score │ Checks │
├──────────────────────────┼────────┼───────┼────────┤
│ Architectural Docs       │    20% │   60% │    3/5 │
│ Mechanical Constraints   │    25% │   91% │    6/7 │
│ Testing & Stability      │    25% │   72% │    5/8 │
│ Review & Drift           │    15% │   60% │    3/6 │
│ AI-Specific Safeguards   │    15% │   67% │    3/5 │
└──────────────────────────┴────────┴───────┴────────┘

Badge

Display your score as a badge in your README, powered by shields.io:

AI Harness Scorecard

Setup:

  1. Add the GitHub Action to your repo (see below) with badge generation enabled (on by default)
  2. The action creates a scorecard-badge.json in your repo -- commit it
  3. Add the badge to your README:
[![AI Harness Scorecard](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FOWNER%2FREPO%2Fmain%2Fscorecard-badge.json)](scorecard-report.md)

Replace OWNER and REPO with your GitHub username and repository name. The badge links to scorecard-report.md in the same repo, which contains the full score breakdown and recommendations.

CLI:

ai-harness-scorecard assess . --badge scorecard-badge.json

Use as GitHub Action

Add the scorecard to any repository's CI with a one-liner:

name: AI Harness Scorecard
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # weekly

jobs:
  scorecard:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4
      - uses: markmishaev76/ai-harness-scorecard@v1
        id: scorecard
      - name: Commit badge and report
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add scorecard-badge.json scorecard-report.md
          git diff --cached --quiet || git commit -m "chore: update scorecard badge and report"
          git push

The action generates scorecard-badge.json by default. The workflow above commits it so shields.io can read it from raw.githubusercontent.com.

Inputs:

Input Default Description
path . Path to the repository to assess
format markdown Output format: markdown, json, or terminal
output-file scorecard-report.md File path for the report
badge-file scorecard-badge.json File path for the shields.io badge JSON (set empty to skip)

Outputs:

Output Description
grade Letter grade (A/B/C/D/F)
score Numeric score (0-100)
report-path Path to the generated report file
badge-path Path to the generated badge JSON file

Platform Support

Works on any cloned Git repository (GitHub, GitLab, Bitbucket, self-hosted). Most checks are file-based and platform-independent.

For platform-specific checks (branch protection, required reviewers), future versions will support:

# GitHub
ai-harness-scorecard assess github:owner/repo

# GitLab
ai-harness-scorecard assess gitlab:group/project

Design Principles

  1. Deterministic. No LLM dependency. Two runs on the same repo produce the same score.
  2. Language-aware. Checks adapt to Rust, Python, TypeScript, Go, Java, etc.
  3. Additive scoring. Each check contributes points. Missing an inapplicable check doesn't penalize.
  4. Research-grounded. Every check maps back to a specific study or published best practice. See resources/ for the full knowledge base.

Documentation

Doc Purpose
ARCHITECTURE.md Module layout, dependency rules, data flow
CONTRIBUTING.md Dev setup, how to add checks, PR guidelines
AGENTS.md AI agent instructions and naming conventions
resources/ Research, blog posts, and references behind each check

Development

# Clone
git clone https://github.com/markmishaev76/ai-harness-scorecard
cd ai-harness-scorecard

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/ tests/

See CONTRIBUTING.md for the full development guide.

License

MIT

About

Grade repositories on engineering safeguards for safe AI-assisted development. Based on DORA 2025, OpenAI Harness Engineering, and SlopCodeBench research.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages