Skip to content

Add more entries to reachability metadata.#74

Open
igankevich wants to merge 1 commit intoyobix-ai:mainfrom
igankevich:fix-reachability-metadata
Open

Add more entries to reachability metadata.#74
igankevich wants to merge 1 commit intoyobix-ai:mainfrom
igankevich:fix-reachability-metadata

Conversation

@igankevich
Copy link
Copy Markdown

This PR adds more entries to reachability-metadata.json. I parsed a large number of PDF files and a couple of DOCX files. It looks like we need to add all xcb files to ensure that we can parse any DOCX file, and there are 1k+ of them...

a12591771 pushed a commit to a12591771/extractous that referenced this pull request Feb 2, 2026
This PR adds reachability metadata for missing Apache POI XMLBeans schemas,
specifically fixing the SchemaTypeLoaderException when parsing DOCX files
with tables.

Key additions:
- cttblprex863ftype.xsb (table prefix properties) - CRITICAL FIX
- ctobject47c9type.xsb (embedded objects)
- ctsdtblock/run (structured document tags)
- ctffdata (form fields)
- ctomath/omathpara (math equations)
- cttrackchange (revision tracking)

Fixes: SchemaTypeLoaderException for DOCX files containing tables

Source: yobix-ai#74
a12591771 pushed a commit to a12591771/extractous that referenced this pull request Feb 2, 2026
Major version upgrade with comprehensive improvements:

## Version Upgrades
- Apache Tika: 2.9.2 (EOL) 鈫?3.2.3 (latest stable)
- GraalVM: 23 鈫?25 (latest with optimizations)
- Gradle plugin: 0.10.3 鈫?0.10.4
- SLF4J: 2.0.11 鈫?2.0.16
- log4j-to-slf4j: 3.0.0-beta2 鈫?2.24.2 (stable)

## New Features
- Jakarta Mail dependencies for email parsing
- Additional parser modules: CAD, code files, advanced media
- GraalVM 25 optimization flags for better performance
- Smaller binary size with RemoveUnusedSymbols

## Technical Details
- GraalVM 25 optimizations: UnlockExperimentalVMOptions, RemoveUnusedSymbols
- Better error reporting with ReportExceptionStackTraces
- Maintained compatibility mode for cross-platform deployment

## Breaking Changes
- Requires GraalVM 25 installation for compilation
- Native Image metadata regeneration needed

Source: yobix-ai#69
Combined with PR yobix-ai#74 for complete XMLBeans schema coverage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant