You have access to the epistemic spine: a set of composable, deterministic Rust CLI tools that establish what can be known, compared, and trusted about data artifacts. These tools produce truth. You decide what to run — they decide what is true.
Every tool is MIT-licensed, open source, and runs locally with no network dependency.
The current stack centers on composable stream tools, report tools, lifecycle tools, and an expanding validation / scoring / decision chain.
vacuum → hashbytes → fingerprint → lock
| Tool | What it does | Key flags |
|---|---|---|
| vacuum | Scans directories, emits sorted JSONL manifest (path, size, mtime, mime) | vacuum <DIR>... --include --exclude |
| hashbytes | Adds SHA-256 or BLAKE3 byte identity to each record | --algorithm blake3 |
| fingerprint | Tests each artifact against template definitions, produces content hashes | --fp <ID> (repeatable, first match wins) --list --diagnose. YAML definitions in ~/.fingerprint/definitions/ run directly without compilation (override with FINGERPRINT_DEFINITIONS). |
| lock | Pins the stream into a self-hashed, tamper-evident lockfile | --dataset-id --as-of --note |
Stream tools read JSONL from stdin, enrich each record, emit to stdout. Pipe them:
vacuum /data/delivery/ | hashbytes | fingerprint --fp csv.v0 | lock --dataset-id "q4" > q4.lock.json| Tool | What it does | Key flags |
|---|---|---|
| shape | Structural comparability gate — can two CSVs be compared? | shape old.csv new.csv --key loan_id --json |
| rvl | Material change detector — which cells explain the difference? | rvl old.csv new.csv --key loan_id --json |
| canon | Entity resolution against versioned registries | canon data.csv --registry reg/ --column id |
Report tools default to human-readable output. Add --json for structured output.
| Tool | What it does | Key flags |
|---|---|---|
| profile | Column-scoping configs for report tools (draft → frozen lifecycle) | profile draft init data.csv profile freeze profile suggest-key |
| pack | Evidence sealing — bundles lockfiles and reports into content-addressed packs | pack seal *.json --note "..." --output dir/ |
| Tool | What it does | Key flags |
|---|---|---|
| verify | Constraint checks against authored or compiled rules | verify data.csv --rules rules.yaml --json verify run constraints.verify.json --bind name=path |
| benchmark | Gold-set scoring against expected facts | benchmark normalized.csv --assertions gold_set.jsonl --key comp_id --json |
| assess | Deterministic PROCEED / PROCEED_WITH_RISK / ESCALATE / BLOCK classification over evidence artifacts | assess shape.json rvl.json verify.json --policy policy.yaml --json |
compare remains the deferred exhaustive diff tool in the broader spine roadmap.
| Code | Meaning |
|---|---|
0 |
Positive / clean (no change, compatible, pass, all matched) |
1 |
Domain-negative (change detected, incompatible, fail, partial) |
2 |
Refusal — tool cannot operate, with structured reason |
A refusal means the tool is being honest: it cannot produce a confident answer. Every refusal includes:
{
"code": "E_NEED_KEY",
"message": "Row counts differ; key alignment required",
"detail": { "suggested_key": "loan_id" },
"next_command": "rvl old.csv new.csv --key loan_id --json"
}code— stable machine identifier (E_UPPERCASE)detail— structured context you can inspectnext_command— literal shell command to retry (when mechanical recovery is possible)
Mechanical recovery: If next_command is present, execute it directly. No reasoning needed.
Judgment recovery: If next_command is null, the refusal requires you to interpret detail and decide. Example: E_MISSINGNESS on a column — you decide whether to exclude it via a profile edit.
<tool> --describe # Prints operator.json — full contract: args, options, exit codes, refusals
<tool> --schema # Prints JSON Schema for structured output
<tool> --version # Prints "<tool> <semver>"Run --describe on any tool to learn its interface without reading docs. The refusals section tells you every failure mode and what to do about each one.
Every tool appends a ~300-byte record to ~/.epistemic/witness.jsonl — content-addressed, hash-chained. This happens automatically. You can query it:
<tool> witness query --since 2026-01-01 --json
<tool> witness last --jsonSuppress with --no-witness when benchmarking.
vacuum /data/2026-02/ | hashbytes | lock --dataset-id "feb-delivery" --as-of "2026-02-28" > feb.lock.jsonYou now have a tamper-evident snapshot of every file — path, size, hash, tool versions. Verify later with lock verify feb.lock.json.
vacuum /dataroom/ | hashbytes | fingerprint --fp argus-model.v1 --fp csv.v0 --fp xlsx.v0 > fp.jsonlEach record now has fingerprint.matched, fingerprint.assertions, and fingerprint.content_hash. Filter matched vs unmatched:
jq 'select(.fingerprint.matched == true)' fp.jsonl # Recognized
jq 'select(.fingerprint.matched == false)' fp.jsonl # Unknown# Learn a fingerprint from example files
fingerprint infer ./cbre-appraisals/ --format markdown --id cbre-appraisal.v1 --out cbre.fp.yaml
# Install the YAML definition for direct runtime evaluation
mkdir -p ~/.fingerprint/definitions
cp cbre.fp.yaml ~/.fingerprint/definitions/
# Test it immediately — no compile step needed
vacuum /dataroom/ | hashbytes | fingerprint --fp cbre-appraisal.v1 --diagnose
# Iterate: edit the YAML, re-copy, re-test
# When stable, compile for production performance:
fingerprint compile cbre.fp.yaml --out fingerprint-cbre-v1/# Gate: are these structurally comparable?
shape nov.csv dec.csv --key loan_id --json > shape.json
# Explain: what actually changed?
rvl nov.csv dec.csv --key loan_id --json > rvl.jsonIf shape returns exit 1 (INCOMPATIBLE), do not run rvl — the datasets are not comparable. Read shape.json for why.
If rvl returns exit 2 (REFUSAL), read the refusal. Common case: E_NEED_KEY → retry with --key. Another: E_MISSINGNESS → a column is mostly null, consider excluding it with a profile.
# Generate a draft profile from a real dataset
profile draft init tape.csv --out tape.draft.yaml
# Suggest a key column
profile suggest-key tape.csv --json
# Edit the draft: set key, remove noisy columns
# (you edit the YAML)
# Lint against the real data
profile lint tape.draft.yaml --against tape.csv
# Run rvl with the draft to see if the answer is useful
rvl old.csv new.csv --key loan_id --profile tape.draft.yaml --json
# If useful, freeze for production
profile freeze tape.draft.yaml --family csv.loan_tape.core --version 0 \
--out profiles/csv.loan_tape.core.v0.yamlDrafts are mutable scratchpads. Frozen profiles are immutable, hashed, recorded in outputs. Explore with drafts, commit with frozen.
Today rvl is profile-aware. Check shape / compare operator surfaces before assuming equivalent --profile behavior in the current release line.
pack seal nov.lock.json dec.lock.json shape.json rvl.json \
--note "Nov→Dec reconciliation" --output evidence/2026-02/The pack is a self-hashed directory. Every member is content-addressed. Verify with pack verify evidence/2026-02/.
canon positions.csv --registry registries/cusip-isin/ --column cusip --jsonExit 0 = all resolved. Exit 1 = partial (check .unresolved[]). Every mapping records rule_id and registry.version for audit.
| Refusal | Tool | What to do |
|---|---|---|
E_NEED_KEY |
rvl, shape | Retry with --key <suggested_key> from detail |
E_DIFFUSE |
rvl | Top contributors don't reach threshold — lower --threshold or scope with --profile |
E_MISSINGNESS |
rvl | Column is mostly null — edit profile to exclude it |
E_NO_NUMERIC |
rvl | No numeric columns found — check data or scope with --profile |
E_UNKNOWN_FP |
fingerprint | Fingerprint ID not installed — fingerprint --list to see available |
E_BAD_INPUT |
hashbytes, fingerprint, lock | Upstream tool didn't run — check pipeline order |
E_COLUMN_NOT_FOUND |
canon, profile | Column name doesn't exist in input — check spelling |
E_INPUT_NOT_LOCKED |
shape, rvl | --lock was provided but input file isn't in the lockfile |
E_TOO_LARGE |
shape, rvl | Input exceeds --max-rows/--max-bytes — increase limit or split |
E_AMBIGUOUS_PROFILE |
shape, rvl | Both --profile and --profile-id given — use one |
When next_command is present in the refusal envelope, execute it. When it's null, apply judgment.
Every tool carries its own contract. You never need external documentation.
# Full contract for any tool
rvl --describe | jq .
# What refusals can rvl produce and what to do about each?
rvl --describe | jq '.refusals[] | {code, action, message}'
# What does rvl's output look like?
rvl --schema | jq .
# What tools does fingerprint expect upstream?
fingerprint --describe | jq '.pipeline'
# What fingerprints are available?
fingerprint --listEvery structured output has a version field. Parse it to know the shape:
| Version | Tool | Output mode |
|---|---|---|
vacuum.v0 |
vacuum | JSONL stream |
hash.v0 |
hashbytes | JSONL stream |
fingerprint.v0 |
fingerprint | JSONL stream |
lock.v0 |
lock | JSON artifact |
shape.v0 |
shape | JSON report (--json) |
rvl.v0 |
rvl | JSON report (--json) |
canon.v0 |
canon | JSON artifact |
pack.v0 |
pack | JSON manifest |
The spine is the data layer. It establishes provenance, identity, comparability, and change. It does not:
- Orchestrate workflows — you decide what to run and in what order
- Store data — tools are stateless; they read files and emit structured output
- Extract or parse documents — fingerprint recognizes templates, it doesn't OCR or transform data
- Make decisions — tools produce deterministic facts; you (or
assess) interpret them - Require a network — everything runs locally, no APIs, no cloud
You bring the orchestration. The spine brings the truth.
# Core provenance / comparison set via Homebrew
brew install cmdrvl/tap/vacuum
brew install cmdrvl/tap/hash # installs the `hashbytes` binary
brew install cmdrvl/tap/fingerprint
brew install cmdrvl/tap/profile
brew install cmdrvl/tap/lock
brew install cmdrvl/tap/shape
brew install cmdrvl/tap/rvl
brew install cmdrvl/tap/canon
brew install cmdrvl/tap/packVerify:
vacuum --version && hashbytes --version && fingerprint --version && \
profile --version && lock --version && shape --version && \
rvl --version && canon --version && pack --version- github.com/cmdrvl — all repos, all MIT
- Each tool:
<tool> --describefor the full machine-readable contract