Summary
Introduce a formal, versioned JSON Schema–based validation system (tokenlist.schema.json) plus a lightweight Rust (or Python) CLI/CI validator to ensure every token list in data/ and associated logo in logos/ meets required structural, semantic, and referential integrity guarantees (e.g., required fields, address formatting, chain identifiers, logo existence, checksum normalization). This adds automated quality gates, reduces review burden, and prevents downstream consumer breakage.
Problem Statement
Currently the repository contains raw data assets (data/ and logos/) but (based on the root file inventory) lacks:
- A canonical schema file (no
schema.json, tokenlist.schema.json, or similar present at root).
- Any explicit validation script targets (the
justfile exists, but no evidence of a validation recipe can be cited without a schema).
- Enforced linkage between token metadata entries and logo assets (no manifest cross-check).
This creates friction for contributors (unsure of required fields), increases maintainer review time (manual structural & semantic checks), and risks silent data regressions (typos, inconsistent casing, missing logos, duplicate symbols, chain ID drift).
Proposed Solution
Add a first-class validation layer with these components:
-
Specification & Schema
- Create
tokenlist.schema.json (draft-2020-12) defining:
- Top-level:
name, timestamp (ISO 8601), version (object: major/minor/patch), tokens (array).
- Token object:
chainId (uint), address (hex or bech32 pattern depending on ecosystem), symbol, name, decimals (int 0–36), logoURI (relative or absolute), optional extensions object.
- Constraints: unique
(chainId, address) pair; unique symbol within same chainId; logoURI must exist under logos/ if relative.
- Include semantic versioning rules (major = breaking schema change, minor = additive token additions, patch = metadata corrections only).
- Document normalization (e.g., lowercase addresses where chain spec dictates).
-
Validator CLI
- Implement a Rust binary (fits dominant language ratio) under
validator/ (e.g., validator/src/main.rs) that:
- Loads schema (embed via
include_str! for reproducibility).
- Iterates every JSON file under
data/ (pattern).
- Performs JSON Schema validation (crate:
jsonschema), plus custom checks: duplicates, logo existence, symbol conflicts, semantic version increment correctness (compare git diff vs HEAD~1).
- Outputs machine-readable report (JSON) and human summary (table) with non-zero exit on failure.
-
Just & Nix Integration
- Add a
just validate recipe calling the binary.
- Extend
flake.nix to expose packages.validator & a dev shell with required crates.
- Add CI job (e.g.,
.github/workflows/validate.yml) running on pull_request & push to main:
nix develop -c just validate (or cargo run -p validator).
- Fails PR if any structural or semantic rule violated.
-
Optional Phase 2 (not in MVP scope but structurally prepared):
- Caching & diff-based partial validation.
- Auto-fix mode for address casing & sorting.
Alternatives Considered
- Manual review only: High ongoing maintainer time; inconsistent standards.
- Ad-hoc Python script (no schema): Improves some checks but still lacks explicit contract discoverability for external integrators.
- Rely on downstream consumers’ validation: Shifts cost outward and delays feedback; errors surface late.
Use Cases / Benefits
- Maintainers: Fast automated gate → reduced PR review minutes; confidence in consistency.
- Contributors: Immediate local feedback (
just validate) clarifies required fields.
- Integrators: Public, versioned schema enables programmatic ingestion & tooling generation (TypeScript types, etc.).
- Reliability: Eliminates silent data drift (duplicate symbols; missing logos) lowering production incident risk.
Potential Risks / Trade-offs
- Slight upfront implementation cost and CI runtime (expected minimal).
- Schema evolution complexity (mitigated via semantic version rules).
- Risk of over-constraining (mitigate by allowing
extensions object for future growth).
Additional Context / References
- Popular ecosystems (e.g., Uniswap token lists) leverage JSON Schema for tooling & safety.
- Present repository root files (no existing schema) → opportunity for structured improvement:
.envrc, flake.nix, justfile, data/, logos/.
Alternatives Considered (explicit)
- Single monolithic JSON file vs multiple: Keep existing file layout; validator aggregates.
- Embedding schema vs runtime fetch: Embed for reproducibility & offline development.
Use Cases / Benefits (Expanded)
- As a contributor, I get deterministic validation locally stopping avoidable CI failures.
- As a maintainer, I can quickly triage metadata-only vs structural PRs (version bump logic).
- As a downstream consumer, I can auto-generate strongly-typed clients from the schema.
Potential Risks / Trade-offs (Expanded)
- False negatives if schema too lax → iterative tightening allowed.
- Need for deterministic ordering (optional future enhancement).
Additional Context / References
- Root inventory confirms absence of
schema.json; adding one is non-breaking.
Expanded Details
Schema Sketch (Illustrative)
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "TokenList",
"type": "object",
"required": ["name", "timestamp", "version", "tokens"],
"properties": {
"name": { "type": "string", "minLength": 1 },
"timestamp": { "type": "string", "format": "date-time" },
"version": {
"type": "object",
"required": ["major", "minor", "patch"],
"properties": {
"major": { "type": "integer", "minimum": 0 },
"minor": { "type": "integer", "minimum": 0 },
"patch": { "type": "integer", "minimum": 0 }
},
"additionalProperties": false
},
"tokens": {
"type": "array",
"items": {
"type": "object",
"required": ["chainId", "address", "symbol", "name", "decimals", "logoURI"],
"properties": {
"chainId": { "type": "integer", "minimum": 1 },
"address": { "type": "string", "pattern": "^(0x[a-fA-F0-9]{40}|[a-z0-9]{1,})$" },
"symbol": { "type": "string", "minLength": 1 },
"name": { "type": "string", "minLength": 1 },
"decimals": { "type": "integer", "minimum": 0, "maximum": 36 },
"logoURI": { "type": "string", "minLength": 1 },
"extensions": { "type": "object", "additionalProperties": true }
},
"additionalProperties": false
},
"uniqueItems": false
}
},
"additionalProperties": false
}
CLI Behavior Outline
- Command:
token-validate [--schema tokenlist.schema.json] [--diff-base origin/main]
- Exit codes:
0 success, 1 structural error, 2 semantic (duplicates/version), 3 IO error.
- Output example:
✔ schema pass (data/mainnet.json)
✖ duplicate symbol (chainId=1 symbol=ABC) in data/mainnet.json
Version Consistency Rule
- Add tokens → require at least
minor increment if major unchanged & patch reset semantics.
- Metadata correction (no additions/removals) → patch increment only.
Acceptance Criteria
Prioritization
- RICE: Reach=40 (contributors + integrators), Impact=3 (prevents class of data defects), Confidence=80%, Effort=1.5 person-weeks → Score = (40 * 3 * 0.8) / 1.5 ≈ 64
- MoSCoW: Must — Foundational quality gate enabling safe scaling of token entries.
Feasibility & Integration Points
flake.nix: Add package/output for validator; minimal modifications.
justfile: Add validate recipe invoking cargo run -p validator or nix run .#validator.
- Directory: new
validator/ crate (Rust aligns with repo majority language).
- CI: new
.github/workflows/validate.yml (no existing file conflict indicated).
- No breaking change: existing data remains valid if it meets schema; else PRs update formatting.
Quality Considerations
- Security: Low risk; purely static validation.
- Performance: Fast (JSON parse + simple checks); O(N) in tokens.
- Reliability: Deterministic validation reduces runtime consumer failures.
- Accessibility: N/A (CLI textual output; ensure color-safe).
- i18n: Minimal; token metadata likely English; allow Unicode names.
- Observability: Clear exit codes & structured JSON report for future telemetry.
- Maintainability: Schema centralizes rules; modular validator functions facilitate unit tests.
Validation Plan
- Unit tests: duplicate detection, version increment logic, logo existence fallback.
- Integration test: run
just validate against a fixture repo state (GitHub Action).
- Manual: Introduce a deliberate error in a branch PR → CI fails with actionable message.
- Rollout:
- Phase 1: Introduce schema & “warn” mode (non-blocking) for 1–2 days.
- Phase 2: Switch to blocking on PR after initial fixes.
- Rollback: Disable CI job or pin to previous commit; low complexity.
Docs & DX Updates
- README: Add “Validation & Contribution” section with command examples.
- Add
docs/SCHEMA.md describing fields, versioning rules, examples.
- Changelog: Entry “Added schema-driven automated validation”.
- Contributor guide (if added later): reference
just validate.
Related Issues/PRs
(No related issues identified in provided data set; this appears novel based on absence of existing schema/validation artifacts.)
Risks & Mitigations
- Risk: Overly strict schema rejects legitimate tokens → Mitigation: allow
extensions & iterate constraints.
- Risk: Contributors ignore local validation → Mitigation: CI gate enforces compliance.
- Risk: Schema drift vs validator logic → Mitigation: derive validator rules from schema programmatically.
- Risk: Performance regression with very large lists → Mitigation: early exits & linear complexity; optional future diff mode.
Summary
Introduce a formal, versioned JSON Schema–based validation system (tokenlist.schema.json) plus a lightweight Rust (or Python) CLI/CI validator to ensure every token list in
data/and associated logo inlogos/meets required structural, semantic, and referential integrity guarantees (e.g., required fields, address formatting, chain identifiers, logo existence, checksum normalization). This adds automated quality gates, reduces review burden, and prevents downstream consumer breakage.Problem Statement
Currently the repository contains raw data assets (
data/andlogos/) but (based on the root file inventory) lacks:schema.json,tokenlist.schema.json, or similar present at root).justfileexists, but no evidence of a validation recipe can be cited without a schema).This creates friction for contributors (unsure of required fields), increases maintainer review time (manual structural & semantic checks), and risks silent data regressions (typos, inconsistent casing, missing logos, duplicate symbols, chain ID drift).
Proposed Solution
Add a first-class validation layer with these components:
Specification & Schema
tokenlist.schema.json(draft-2020-12) defining:name,timestamp(ISO 8601),version(object: major/minor/patch),tokens(array).chainId(uint),address(hex or bech32 pattern depending on ecosystem),symbol,name,decimals(int 0–36),logoURI(relative or absolute), optionalextensionsobject.(chainId, address)pair; uniquesymbolwithin samechainId;logoURImust exist underlogos/if relative.Validator CLI
validator/(e.g.,validator/src/main.rs) that:include_str!for reproducibility).data/(pattern).jsonschema), plus custom checks: duplicates, logo existence, symbol conflicts, semantic version increment correctness (compare git diff vs HEAD~1).Just & Nix Integration
just validaterecipe calling the binary.flake.nixto exposepackages.validator& a dev shell with required crates..github/workflows/validate.yml) running onpull_request&pushto main:nix develop -c just validate(orcargo run -p validator).Optional Phase 2 (not in MVP scope but structurally prepared):
Alternatives Considered
Use Cases / Benefits
just validate) clarifies required fields.Potential Risks / Trade-offs
extensionsobject for future growth).Additional Context / References
.envrc,flake.nix,justfile,data/,logos/.Alternatives Considered (explicit)
Use Cases / Benefits (Expanded)
Potential Risks / Trade-offs (Expanded)
Additional Context / References
schema.json; adding one is non-breaking.Expanded Details
Schema Sketch (Illustrative)
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "TokenList", "type": "object", "required": ["name", "timestamp", "version", "tokens"], "properties": { "name": { "type": "string", "minLength": 1 }, "timestamp": { "type": "string", "format": "date-time" }, "version": { "type": "object", "required": ["major", "minor", "patch"], "properties": { "major": { "type": "integer", "minimum": 0 }, "minor": { "type": "integer", "minimum": 0 }, "patch": { "type": "integer", "minimum": 0 } }, "additionalProperties": false }, "tokens": { "type": "array", "items": { "type": "object", "required": ["chainId", "address", "symbol", "name", "decimals", "logoURI"], "properties": { "chainId": { "type": "integer", "minimum": 1 }, "address": { "type": "string", "pattern": "^(0x[a-fA-F0-9]{40}|[a-z0-9]{1,})$" }, "symbol": { "type": "string", "minLength": 1 }, "name": { "type": "string", "minLength": 1 }, "decimals": { "type": "integer", "minimum": 0, "maximum": 36 }, "logoURI": { "type": "string", "minLength": 1 }, "extensions": { "type": "object", "additionalProperties": true } }, "additionalProperties": false }, "uniqueItems": false } }, "additionalProperties": false }CLI Behavior Outline
token-validate [--schema tokenlist.schema.json] [--diff-base origin/main]0success,1structural error,2semantic (duplicates/version),3IO error.Version Consistency Rule
minorincrement if major unchanged & patch reset semantics.Acceptance Criteria
tokenlist.schema.jsonpresent at repository root with documented required fields.just validateruns the validator and fails on structural or semantic issues.data/orlogos/.(chainId,address), duplicate symbol per chain, missing logo file, invalid version increment.Prioritization
Feasibility & Integration Points
flake.nix: Add package/output for validator; minimal modifications.justfile: Addvalidaterecipe invokingcargo run -p validatorornix run .#validator.validator/crate (Rust aligns with repo majority language)..github/workflows/validate.yml(no existing file conflict indicated).Quality Considerations
Validation Plan
just validateagainst a fixture repo state (GitHub Action).Docs & DX Updates
docs/SCHEMA.mddescribing fields, versioning rules, examples.just validate.Related Issues/PRs
(No related issues identified in provided data set; this appears novel based on absence of existing schema/validation artifacts.)
Risks & Mitigations
extensions& iterate constraints.