Skip to content

schema: Add pattern constraints for SPDXID and checksumValue#1382

Open
nmanthey wants to merge 1 commit intospdx:support/2.3.1from
nmanthey:fix/spdx-v2.3-schema-patterns
Open

schema: Add pattern constraints for SPDXID and checksumValue#1382
nmanthey wants to merge 1 commit intospdx:support/2.3.1from
nmanthey:fix/spdx-v2.3-schema-patterns

Conversation

@nmanthey
Copy link
Copy Markdown

@nmanthey nmanthey commented Apr 9, 2026

The SPDX v2.3 specification defines the idstring grammar for element identifiers as letters, numbers, '.' and '-' (Annex D), and requires checksumValue to be a lowercase hexadecimal string (Section 8.4.1). However, the JSON schema does not enforce either constraint, allowing invalid documents to pass schema validation.

Add pattern constraints:
- SPDXID: '^SPDXRef-[a-zA-Z0-9.-]+$' on Package, File and Snippet elements (the Document SPDXID has no such restriction per Section 6.3)
- checksumValue: '^[0-9a-f]+$' on all three checksum definitions

These patterns match what the spdx-tools Python validator already enforces programmatically.

References:
- SPDXID restrictions for Packages: https://spdx.github.io/spdx-spec/v2.3/package-information/#7.2
- SPDXID restrictions for Files: https://spdx.github.io/spdx-spec/v2.3/file-information/#8.2
- SPDXID restrictions for Snippets: https://spdx.github.io/spdx-spec/v2.3/snippet-information/
- No SPDXID restrictions for Document: https://spdx.github.io/spdx-spec/v2.3/document-creation-information/#6.3
- checksumValue hex format: https://spdx.github.io/spdx-spec/v2.3/file-information/#841-description

Testing Done

Verification:

  • The existing examples/SPDXJSONExample-v2.3.spdx.json passes validation with the updated schema
  • Invalid values (e.g., SPDXID with underscores, uppercase hex in checksumValue) are now correctly rejected

The SPDX v2.3 specification defines the idstring grammar for element
identifiers as letters, numbers, '.' and '-' (Annex D), and requires
checksumValue to be a lowercase hexadecimal string (Section 8.4.1).
However, the JSON schema does not enforce either constraint, allowing
invalid documents to pass schema validation.

Add pattern constraints:
- SPDXID: '^SPDXRef-[a-zA-Z0-9.-]+$' on Package, File and Snippet
  elements (the Document SPDXID has no such restriction per Section 6.3)
- checksumValue: '^[0-9a-f]+$' on all three checksum definitions

These patterns match what the spdx-tools Python validator already
enforces programmatically.

References:
- SPDXID restrictions for Packages: https://spdx.github.io/spdx-spec/v2.3/package-information/#7.2
- SPDXID restrictions for Files: https://spdx.github.io/spdx-spec/v2.3/file-information/#8.2
- SPDXID restrictions for Snippets: https://spdx.github.io/spdx-spec/v2.3/snippet-information/
- No SPDXID restrictions for Document: https://spdx.github.io/spdx-spec/v2.3/document-creation-information/#6.3
- checksumValue hex format: https://spdx.github.io/spdx-spec/v2.3/file-information/#841-description

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
@bact
Copy link
Copy Markdown
Collaborator

bact commented Apr 9, 2026

@goneall does this matched with Java tools? Will it create any issues?

@goneall
Copy link
Copy Markdown
Member

goneall commented Apr 9, 2026

@goneall does this matched with Java tools? Will it create any issues?

The checks for the element IDs match the Java implementation. The checksum checks are a little different. The Java tools are more relaxed on the case sensitivity for a-h allowing upper case. This was done intentionally quite some time ago. If I recall, there were just too many cases where uppercase letters were being used - even though the spec specifies lower case. Perhaps on the next implementers call, we can find out how strict other SPDX tools are. If everyone else is allowing upper case, perhaps we should change the spec and allow it, otherwise I we can change the Java tools to be more strict.

Another difference is the Java tools checks the length of the checksum value string which is a bit more strict than the schema - which I think is fine (it would be really complex to implement this in a JSON schema).

BTW - most of the verification methods are in the SpdxVerificationHelper.java file. For a JSON serialization, we also check against the schema.

@bact bact added this to the 2.3.1 milestone Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants