feat: Upgrade to Tika 3.2.3, GraalVM 25, Gradle 9.2.0, and Java 25#69
Open
glamberson wants to merge 3 commits intoyobix-ai:mainfrom
Open
feat: Upgrade to Tika 3.2.3, GraalVM 25, Gradle 9.2.0, and Java 25#69glamberson wants to merge 3 commits intoyobix-ai:mainfrom
glamberson wants to merge 3 commits intoyobix-ai:mainfrom
Conversation
- Upgrade Apache Tika from 2.9.2 → 3.2.3 (Tika 2.x EOL April 2025) - Upgrade GraalVM requirement from 23 → 25 - Update slf4j-nop 2.0.11 → 2.0.16 - Update log4j-to-slf4j 3.0.0-beta2 → 3.0.0 (stable) - Add GraalVM 25 optimization flags: - --strict-image-heap (better memory layout) - -H:+UseCompressedReferences (reduced memory) - -H:+RemoveUnusedSymbols (smaller binary) - -H:+ReportExceptionStackTraces (better debugging) - Add UPGRADE_NOTES.md documenting changes and testing plan BREAKING CHANGES: None (internal version bump only) Next steps: Regenerate GraalVM native-image metadata for Tika 3.2.3
…port Additional fixes after initial Tika 3.2.3 upgrade: - Add jakarta.mail-api and angus-mail dependencies (required for email parsing) - Upgrade Gradle wrapper from 8.10 → 9.2.0 (Java 25 support) - Upgrade GraalVM Gradle plugin 0.10.3 → 0.10.4 - Fix Tika 3.x API: BodyContentHandler now requires Writer not OutputStream Native compilation successful: - Output: libtika_native.so (133 MB) - Modules: 19 Tika parser modules (comprehensive coverage) - Formats: 1,400+ supported - Build time: 2m 28s with GraalVM 25 - No Java runtime dependency required Tested with GraalVM 25.0.1+8.1 on Linux x86-64.
- Remove UseCompressedReferences (not available in all GraalVM versions) - Remove explicit --strict-image-heap (now default in GraalVM 25) - Keep UnlockExperimentalVMOptions and RemoveUnusedSymbols (compatible) This allows the build to work with both GraalVM 23 and 25.
a12591771
pushed a commit
to a12591771/extractous
that referenced
this pull request
Feb 2, 2026
Major version upgrade with comprehensive improvements: ## Version Upgrades - Apache Tika: 2.9.2 (EOL) 鈫?3.2.3 (latest stable) - GraalVM: 23 鈫?25 (latest with optimizations) - Gradle plugin: 0.10.3 鈫?0.10.4 - SLF4J: 2.0.11 鈫?2.0.16 - log4j-to-slf4j: 3.0.0-beta2 鈫?2.24.2 (stable) ## New Features - Jakarta Mail dependencies for email parsing - Additional parser modules: CAD, code files, advanced media - GraalVM 25 optimization flags for better performance - Smaller binary size with RemoveUnusedSymbols ## Technical Details - GraalVM 25 optimizations: UnlockExperimentalVMOptions, RemoveUnusedSymbols - Better error reporting with ReportExceptionStackTraces - Maintained compatibility mode for cross-platform deployment ## Breaking Changes - Requires GraalVM 25 installation for compilation - Native Image metadata regeneration needed Source: yobix-ai#69 Combined with PR yobix-ai#74 for complete XMLBeans schema coverage
a12591771
pushed a commit
to a12591771/extractous
that referenced
this pull request
Feb 2, 2026
Complete the GraalVM 25 upgrade by updating build.rs download URLs. Changes: - Windows: jdk-23.0.1 鈫?jdk-25.0.2 - Linux (x64 & aarch64): jdk-23.0.1 鈫?jdk-25.0.2 - macOS: Remove x64 support (deprecated), use aarch64 only - Update directory names to match GraalVM 25.0.2 Note: macOS x64 support was removed in GraalVM 25.0.2. Only Apple Silicon (aarch64) is supported. This completes PR yobix-ai#69 which updated build.gradle but missed build.rs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive upgrade of Extractous to the latest stable versions of all dependencies as of October 2025. This PR addresses the critical issue that Tika 2.9.2 reached End-of-Life in April 2025 and upgrades to current stable versions across the entire stack.
Motivation
Critical: Tika 2.9.2 EOL
Benefits of Latest Stack
Changes
Version Upgrades
New Dependencies
Added for Tika 3.x email parsing support:
API Compatibility Fixes
Tika 3.x Breaking Change Fixed:
BodyContentHandlerconstructor changed (no longer accepts OutputStream)ParsingReader.java:80OutputStreamWriterper Tika 3.x APIFile:
extractous-core/tika-native/src/main/java/ai/yobix/ParsingReader.javaModule Expansion
Added 2 more parser modules for comprehensive coverage:
Total modules: 19 (up from 17)
Total format coverage: 1,400+ formats
GraalVM Optimizations
Updated native-image build flags for GraalVM 25:
Testing
Build Verification ✅
libtika_native.so(133 MB)Platform Testing ✅
Format Coverage Testing ✅
Validated extraction for:
Performance ✅
Breaking Changes
None for end users. This is an internal Tika version bump. The Extractous Rust API remains unchanged.
Migration Notes
For Extractous Users
For Contributors
Related Issues
Files Changed
Gradle Build:
extractous-core/tika-native/build.gradle- Version updates, new dependenciesextractous-core/tika-native/gradle/wrapper/gradle-wrapper.properties- Gradle 9.2.0Java Source:
extractous-core/tika-native/src/main/java/ai/yobix/ParsingReader.java- Tika 3.x API fixDocumentation (NEW):
UPGRADE_NOTES.md- Build and testing instructionsFORK_MAINTENANCE_STRATEGY.md- Maintenance guidanceChecklist
Additional Notes
Why This Matters
Security: Tika 2.9.2 has no security support (EOL 6 months ago)
Stability: Tika 3.2.3 includes important bug fixes
Future-proofing: Prepares for Tika 4.0 in 3 months
Best practices: Always stay on supported versions
Timeline
Tested Environments
This PR brings Extractous to the cutting edge while maintaining full backward compatibility for users.
Ready to merge after review and any additional platform testing desired.