100% Safe Rust - The core library uses #![deny(unsafe_code)] with no exceptions:
- All SIMD uses
archmage 0.5with#[arcane]macro for safe target_feature dispatch - Memory operations use
safe_unaligned_simd 0.2.4for unaligned loads/stores - Zero performance loss vs unsafe intrinsics (archmage token caching eliminates dispatch overhead)
- Only FFI modules (compat.rs, test_encoder.rs) opt-in to unsafe for C mozjpeg interop
BANNED SOLUTIONS:
- Disabling features in
max_compression()to hide parity gaps - Changing defaults to avoid broken code paths
- Commenting out features that don't work correctly
REQUIRED APPROACH:
- If a feature produces wrong output, FIX THE FEATURE
- Document gaps honestly in README, but keep features enabled
If tests fail, find and fix the bug. Never:
- Increase allowed difference thresholds
- Skip failing tests
- Mark tests as
#[ignore]to make CI green
AC refinement correction bits tracking (FIXED Jan 2025):
- Root cause:
count_ac_refine()didn't track correction bits accumulation, causing DHT to miss intermediate EOB symbols when encoder flushed early due to buffer overflow - Fix: Added
correction_bits_countfield toProgressiveSymbolCounterand matching flush threshold (937 bytes) to count_ac_refine() - Result: ProgressiveBalanced Q90+ now decodes correctly with all decoders (GitHub #2)
AC refinement encoding (FIXED Dec 2024):
- Root cause: ZRL loop only ran for newly-nonzero coefficients, not previously-coded ones
- Fix: Restructured encode_ac_refine() to match C mozjpeg's loop structure exactly
- Result: All progressive modes with successive approximation now work correctly
optimize_scans producing larger files (FIXED Dec 2024):
- Root cause: Trial encoder used per-scan Huffman tables, actual encoder used global tables
- Fix: Actual encoder now uses per-scan AC Huffman tables when optimize_scans=true
- Result: optimize_scans now produces smaller files at all quality levels
Scan optimizer algorithm bugs (FIXED Feb 2026):
- Root cause 1: DC successive approximation was applied but C mozjpeg JCP_MAX_COMPRESSION doesn't use it
- Root cause 2: Frequency split decisions made at Al=0 were incorrectly applied at Al>0
- Root cause 3: Frequency split cost wasn't compared against SA cost before selecting SA level
- Fix: Removed DC SA, apply freq split only at Al=0, compare freq split vs SA costs for both luma and chroma
- Result: Max Compression mode now within ±0.7% average, some images smaller than C (better optimization)
Rust port of Mozilla's mozjpeg JPEG encoder, following the jpegli-rs methodology.
Scope: Encoder only + mozjpeg extensions (trellis, progressive scans, deringing) API: Idiomatic Rust (not C-compatible) Validation: FFI dual-execution against C mozjpeg
Tests passing: 164 unit + 8 codec comparison + 5 FFI validation
Kodak corpus (24 images), 4:2:0, exact color match (default). 6 configs × 6 quality levels.
Reproduce: cargo test --release --test parity_benchmark -- --nocapture
| Config | Q | Delta | Max Dev |
|---|---|---|---|
| Baseline | 55 | +0.00% | 0.00% |
| Baseline | 65 | +0.00% | 0.00% |
| Baseline | 75 | +0.00% | 0.00% |
| Baseline | 85 | +0.00% | 0.00% |
| Baseline | 90 | +0.00% | 0.00% |
| Baseline | 95 | +0.00% | 0.00% |
| Baseline + Trellis | 55 | -0.76% | 1.35% |
| Baseline + Trellis | 65 | -0.59% | 1.22% |
| Baseline + Trellis | 75 | -0.47% | 1.26% |
| Baseline + Trellis | 85 | -0.22% | 0.74% |
| Baseline + Trellis | 90 | -0.12% | 0.75% |
| Baseline + Trellis | 95 | -0.05% | 0.64% |
| Full Baseline | 55 | -0.71% | 1.25% |
| Full Baseline | 65 | -0.56% | 1.18% |
| Full Baseline | 75 | -0.47% | 1.24% |
| Full Baseline | 85 | -0.22% | 0.73% |
| Full Baseline | 90 | -0.13% | 0.73% |
| Full Baseline | 95 | -0.07% | 0.64% |
| Progressive | 55 | +0.00% | 0.00% |
| Progressive | 65 | +0.00% | 0.00% |
| Progressive | 75 | +0.00% | 0.00% |
| Progressive | 85 | +0.00% | 0.00% |
| Progressive | 90 | +0.00% | 0.00% |
| Progressive | 95 | +0.00% | 0.00% |
| Progressive + Trellis | 55 | -0.80% | 1.33% |
| Progressive + Trellis | 65 | -0.58% | 1.23% |
| Progressive + Trellis | 75 | -0.41% | 1.10% |
| Progressive + Trellis | 85 | -0.21% | 0.76% |
| Progressive + Trellis | 90 | -0.15% | 0.48% |
| Progressive + Trellis | 95 | -0.08% | 0.61% |
| Full Progressive | 55 | -0.78% | 1.35% |
| Full Progressive | 65 | -0.58% | 1.21% |
| Full Progressive | 75 | -0.40% | 1.07% |
| Full Progressive | 85 | -0.21% | 0.72% |
| Full Progressive | 90 | -0.14% | 0.51% |
| Full Progressive | 95 | -0.08% | 0.63% |
| Max Compression | 55 | -0.72% | 1.70% |
| Max Compression | 65 | -0.45% | 0.86% |
| Max Compression | 75 | +0.01% | 0.96% |
| Max Compression | 85 | +0.15% | 1.24% |
| Max Compression | 90 | +0.21% | 0.85% |
| Max Compression | 95 | +0.17% | 1.15% |
Configs: Baseline = huffman opt only. +Trellis = AC trellis. Full = AC trellis + DC trellis + deringing. Max Compression = Full + optimize_scans: true. All others use optimize_scans: false. All use force_baseline: true.
Key findings:
- Byte-exact parity for Baseline and Progressive modes (0.00% delta) with default color conversion
- With trellis, Rust produces smaller files than C (-0.05% to -0.80%)
- Max Compression within ±0.72% average, max deviation 1.70%
- Visual quality equivalent (SSIMULACRA2 and Butteraugli verified)
Mode explanations:
- Baseline (
progressive(false)): Sequential DCT - Progressive (
progressive(true), optimize_scans(false)): 9-scan JCP_MAX_COMPRESSION script with successive approximation - Max Compression (
Encoder::max_compression()): Progressive +optimize_scans=truewith per-scan Huffman tables
2048x2048 image (30 iterations):
Reproduce: cargo test --release --test bench_2k -- --nocapture
| Configuration | Rust (ms) | C (ms) | Ratio | Notes |
|---|---|---|---|---|
| Baseline (no opts) | 47.88 | 9.65 | 4.96x slower | Entropy encoding bottleneck |
| Trellis AC+DC | 202.84 | 217.29 | 0.93x faster | Rust 7% faster! |
Key findings:
- Trellis mode: Rust is 7% faster than C mozjpeg (at 2048x2048)
- Baseline gap is ~5x - entropy encoding is the remaining bottleneck
- Color conversion: c_compat default (~3.6 Gpix/s AVX2), yuv crate opt-in (~5.2 Gpix/s)
* Progressive mode note: Both Rust and C support optimize_scans which tries multiple
scan configurations to find the smallest output. Progressive encoding now works correctly
for all image sizes including non-MCU-aligned dimensions with subsampling.
- Layer 0: Constants, types, error handling
- Layer 1: Quantization tables, Huffman table construction
- Layer 2: Forward DCT, color conversion, chroma subsampling
- Layer 3: Bitstream writer with byte stuffing
- Layer 4: Entropy encoder, trellis quantization (AC + DC)
- Layer 5: Progressive scan generation (integrated)
- Layer 6: Marker emission, Full encoder pipeline
- FFI Validation: Granular comparison tests against C mozjpeg
The encoder produces valid JPEG files with mozjpeg-quality compression:
use mozjpeg_rs::{Encoder, Subsampling, TrellisConfig};
// Default encoding (trellis + Huffman optimization enabled)
let encoder = Encoder::new().quality(85);
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;
// Maximum compression (progressive + trellis + optimized Huffman)
let encoder = Encoder::max_compression();
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;
// Fastest encoding (no optimizations)
let encoder = Encoder::fastest().quality(85);
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;
// Custom configuration
let encoder = Encoder::new()
.quality(75)
.progressive(true)
.subsampling(Subsampling::S420)
.trellis(TrellisConfig::default())
.optimize_huffman(true);
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;
// Faster color conversion (opt into yuv crate)
let encoder = Encoder::baseline_optimized()
.quality(85)
.fast_color(true); // ~40% faster, ±1 rounding vs C mozjpeg
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;- Baseline JPEG encoding - Standard sequential DCT
- Progressive JPEG encoding - Multi-scan with DC first, then AC bands (RGB and grayscale)
- Trellis quantization - Rate-distortion optimized AC + DC quantization (mozjpeg core feature)
- Trellis speed optimization - Adaptive search limiting for high-entropy blocks (Q80-100)
- DC trellis optimization - Dynamic programming across blocks for optimal DC encoding
- Huffman table optimization - 2-pass encoding for optimal tables
- Chroma subsampling - 4:4:4, 4:2:2, 4:2:0 modes
- Quality presets -
max_compression()andfastest() - Overshoot deringing - Reduce ringing artifacts at sharp edges (see below)
- Optimize scans - Try multiple scan configurations for progressive mode, pick smallest
- Grayscale progressive - Full progressive JPEG support for grayscale images
- Smoothing filter - Noise reduction for dithered images (
.smoothing(30)) - C-compat color conversion - Exact C mozjpeg parity (default, AVX2-accelerated)
- SIMD optimizations - AVX2 (x86_64) and NEON (aarch64) with archmage safe intrinsics
- Baseline entropy encoding - ~4.7x slower than C (trellis mode is 10% faster than C)
- Arithmetic coding (optional, rarely used)
freq_split(default 8) — Field exists, only used in progressive scan generation, not in the trellis DP loop. Could split AC trellis into low/high frequency passes.trellis_quantize_block_with_eob_info()already acceptsss/separameters.q_opt(default false) — Field exists but not implemented. Would optimize quant tables within trellis loop. Not implemented in C mozjpeg either (experimental/placeholder).
delta_dc_weight(default 0.0) — Wired up indc_trellis_optimize_indexed(). When > 0.0, blends vertical DC gradient error into the distortion cost, encouraging smoother DC transitions between rows. Matches C mozjpegtrellis_delta_dc_weight(jcdctmgr.c:1069-1084). Default 0.0 matches C defaults (disabled).use_lambda_weight_tbl— Not used. C mozjpeg hardcodes flat (1/q^2) weights regardless of this flag. Rust matches C behavior.num_loops— Not used. Always runs one trellis pass.
- Multipass trellis (
use_scans_in_trellis) - C mozjpeg benchmarks show +0.52% larger files, imperceptible quality improvement (-0.05 butteraugli), 20% slower encoding. Not worth implementing.
- EOB cross-block optimization (
TrellisConfig::eob_optimization(true)) - Experimental cross-block EOBRUN optimization. Disabled by default due to aggressive coefficient zeroing in some cases. Enable withTrellisConfig::default().eob_optimization(true)if needed.
- AC refinement correction bits tracking (Jan 2025): Fixed bug where
count_ac_refine()didn't track correction bits accumulation, causing DHT to miss intermediate EOB symbols. The encoder flushes EOBRUN when correction bits exceed 937 bytes, but the counter wasn't tracking this threshold. This caused ProgressiveBalanced Q90+ to fail with strict decoders. - AC refinement ZRL encoding (Dec 2024): Fixed bug where ZRL (zero-run-length) symbols weren't emitted for previously-coded coefficients, causing decoder errors on 8/24 images. Now uses 9-scan JCP_MAX_COMPRESSION script with full successive approximation.
- Progressive AC scan block count (Dec 2024): Fixed bug where non-MCU-aligned images
with subsampling produced corrupted progressive JPEGs. AC scans now correctly encode
ceil(width/8) × ceil(height/8)blocks instead of MCU-padded block count.
Both baseline and progressive modes work correctly! With trellis + Huffman optimization, Rust produces files with quality matching C mozjpeg across all image sizes and subsampling modes.
File size comparison (24 images, 9-scan SA script, trellis disabled):
- Total: C = 1,939,046 bytes, Rust = 1,939,068 bytes (+22 bytes, +0.00%)
- Per-image range: -30 to +30 bytes (±0.05%)
- Not a fixed offset - varies per image due to coefficient rounding
Segment comparison:
- DQT (quant tables): ✅ Identical
- SOF2 (frame header): ✅ Identical
- DHT (Huffman tables): ✅ Identical
- SOS headers: ✅ Identical
- Entropy data: First 257 bytes match, then diverges
Decoded pixel comparison:
- R channel: 0 differences (perfect match)
- G channel: 53 pixels differ by ≤1
- B channel: 122 pixels differ by ≤4
The entropy divergence is caused by minor coefficient rounding differences that cascade through DC differential encoding. Both produce visually identical images.
- Run image delta diffs with C mozjpeg: Detect if variance in compression is due to a few bad blocks or distributed across all blocks. This will help identify if the size gap is from systematic coefficient differences or localized issues.
Original symptom: Rust produced ~1-4% larger files with optimize_scans at low
quality levels (Q10-Q50), with kodim23 showing +3.37% at Q40.
Root cause: encode_rust() in test_encoder.rs was not passing optimize_scans
to the Encoder builder chain. The Rust scan optimizer was never called — the encoder
always used the fixed 9-scan script regardless of the optimize_scans flag. The C
encoder correctly used its scan search to find simpler scripts (4-5 scans, no SA) at
low quality.
Fix: Added .optimize_scans(config.optimize_scans) to encode_rust() builder chain
in src/test_encoder.rs.
Result: Max Compression within ±0.4% average at all quality levels. At low quality (Q10-Q50) Rust is now smaller than C. Per-image max deviation ~1.6% (from different local optima in scan search, not a bug).
Previous fixes (Dec 2025): ScanTrialEncoder sequential encoding + per-scan Huffman. Previous note (Feb 2025): C test harness optimize_scans control.
Symptom: 8/24 Kodak images failed to decode with "failed to decode huffman code" when using progressive mode with successive approximation refinement scans.
Root Cause: In encode_ac_refine() and count_ac_refine(), the ZRL (zero-run-length)
loop was only inside the "temp == 1" (newly nonzero) branch, but it needs to run for BOTH
temp > 1 (previously coded) AND temp == 1 cases. When a long run of zeros preceded a
previously coded coefficient, ZRL symbols weren't emitted, corrupting the bitstream.
Fix: Restructured the encoding loop to match C mozjpeg's jcphuff.c exactly:
- Pre-pass to compute
absvalues[]and findeob(last newly-nonzero position) - ZRL loop now runs for all non-zero coefficients before the temp > 1 vs temp == 1 check
- Added
k <= eobcondition to prevent ZRL emission after the last newly-nonzero
Status: All 24 Kodak images decode successfully with 9-scan JCP_MAX_COMPRESSION script. Successive approximation (al_max_luma=3, al_max_chroma=2) is now fully enabled.
Previously reported "max diff ~11" was due to comparing different encoding modes:
- C mozjpeg defaults to
JCP_MAX_COMPRESSIONprofile which enables progressive mode - Rust was using baseline mode in comparisons
Investigation findings (Dec 2024):
| Component | Match Status | Notes |
|---|---|---|
| DCT | ✅ Exact | FFI test passes |
| Quantization | ✅ Exact | FFI test passes |
| Trellis | ✅ Exact | FFI test passes (new Dec 2024) |
| Color conversion | ✅ ±1 | Rounding variance |
| Downsampling | ✅ Exact | FFI test passes |
| Quant tables | ✅ Identical | Verified in JPEG output |
| Huffman tables | ✅ Identical | With optimize_huffman=true |
| Full encoder | ✅ 0 diff | When comparing same mode |
Key findings:
- With truly identical settings (baseline + Huffman opt), 0 pixel difference
- Trellis quantization produces identical coefficient decisions
- DC clamping to 1023 now matches C behavior
- File size gap at high quality due to progressive scan structure (see above)
When testing decoder round-trips with very small images using chroma subsampling, we observed large pixel differences (up to 41) between decoders. Investigation revealed this is expected decoder behavior, not an encoder bug.
The exact boundary: chroma_width == 2 (luma width 3-4 with horizontal subsampling)
| Luma Width | Chroma Width | Rust Decoders vs mozjpeg | Notes |
|---|---|---|---|
| 1-2 | 1 | ≤4 diff | Single sample, all agree |
| 3-4 | 2 | 24-41 diff | Triangle vs simple upsampling |
| 5+ | 3+ | 0 diff | Enough samples, all agree |
Root cause: Different chroma upsampling algorithms:
- jpeg-decoder & zune-jpeg: Simple replication/linear interpolation
- mozjpeg: "Fancy" triangle filter (optimized for perceptual quality)
Surprisingly, Rust decoders are closer to the original image at the boundary:
| Case | Rust mean error | mozjpeg mean error | Winner |
|---|---|---|---|
| 3×4 4:2:2 (chroma 2×4) | 11.6 | 12.1 | Rust by 0.4 |
| 4×4 4:2:0 (chroma 2×2) | 12.3 | 18.0 | Rust by 5.7 |
| 5×4 4:2:2 (chroma 3×4) | 4.7 | 4.7 | Tie (identical) |
The mozjpeg triangle filter is designed for normal images where smooth interpolation improves perceptual quality. But with only 2 chroma samples, the filter introduces more deviation from the original than simple replication.
Test coverage: decoder_roundtrip.rs tests 2496 combinations (26 dimensions ×
4 presets × 3 subsamplings × 8 qualities) with appropriate tolerance for this boundary.
Analysis tool: examples/decoder_chroma_analysis.rs shows pixel-by-pixel comparison.
- Commit when new tests pass - After fixing/completing a module
- Commit when new tests are added - Even if they're failing (documents expected behavior)
- Write descriptive commit messages explaining what was ported
- Validate equivalence layer by layer, not just end-to-end
- Use
mozjpeg-sysfrom crates.io for basic FFI validation - Use
sys-local(incrates/) for granular internal function testing- Builds from local
../mozjpegC source with test exports - C code has been instrumented with
mozjpeg_test_*functions - Tests in
tests/ffi_comparison.rscompare Rust vs C implementations
- Builds from local
NEVER delete tests, FFI comparisons, or instrumentation code. These are essential for:
- Validating correctness against C mozjpeg
- Catching regressions during development
- Documenting expected behavior
- Ensuring byte-exact parity with C implementation
If a test seems obsolete, comment it out with explanation rather than deleting.
NEVER relax test tolerances or skip failing tests to make CI green. When tests fail:
- Debug the root cause - Don't paper over bugs with looser thresholds
- Use DSSIM, not PSNR - PSNR is unreliable for perceptual quality comparison
- Validate byte/coefficient differences - Compare actual encoded values between Rust and C
- Track "off-by-N" errors - Small systematic differences indicate encoder bugs
If a test is failing, the encoder has a bug. Find it and fix it.
- Default quant tables: mozjpeg uses ImageMagick tables (index 3), not JPEG Annex K (index 0)
- Quality scaling: Q50 = 100% scale factor (use tables as-is)
- DCT scaling: Output is scaled by factor of 8 (sqrt(8) per dimension)
- Huffman pseudo-symbol: Symbol 256 ensures no real symbol gets all-ones code
The trellis algorithm requires raw DCT coefficients (scaled by 8):
- Lambda calculation:
lambda = 2^scale1 / (2^scale2 + block_norm)withlambda_base = 1.0 - Per-coefficient weights:
weight[i] = 1/quantval[i]^2 - Quantization divisor:
q = 8 * quantval[i](includes DCT scaling) - Distortion:
(candidate * q - original)^2 * lambda * weight - Cost:
rate + distortionwhere rate is Huffman code size + value bits
Reduces visible ringing artifacts near hard edges, especially on white backgrounds.
Source: jcdctmgr.c:416-550 (~130 lines). Enabled by default with Encoder::new().
Core Insight:
- JPEG can encode values outside the displayable range (0-255)
- Decoders clamp results to 0-255
- To encode white (255), any encoded value ≥ 255 works after clamping
- Hard edges create "square wave" patterns that compress poorly
- By allowing values to "overshoot" above 255, we get smoother waveforms
Algorithm (applied BEFORE DCT, after level shift):
- Scan block for pixels at max value (127 after level shift = 255 original)
- If no max pixels or all max pixels: return unchanged
- Calculate safe overshoot limit:
maxovershoot = maxsample + min(31, 2*DC_quant, (maxsample*64 - sum)/count) - For each run of max-value pixels (in zigzag order):
- Calculate slopes from neighboring pixels
- Apply Catmull-Rom spline interpolation
- Clamp peaks to maxovershoot
Before (hard edge):
Values: 50, 80, 120, 127, 127, 127, 100, 60
After (smooth curve with overshoot):
Values: 50, 80, 120, 135, 140, 138, 100, 60
↑ ↑ ↑
These overshoot 127 but clamp to 255 when decoded
When it helps:
- Images with white backgrounds
- Text and graphics with hard edges
- Any image with saturated regions (pixels at 0 or 255)
API:
// Enabled by default
let encoder = Encoder::new();
// Explicitly enable/disable
let encoder = Encoder::new().overshoot_deringing(true);
let encoder = Encoder::fastest().overshoot_deringing(false); // fastest disables itDeringing + 16-bit SIMD DCT Overflow (mozilla/mozjpeg#453)
Bug: Overshoot deringing pushes level-shifted samples to ±158. The ISLOW forward
DCT's 16-bit SIMD path (_mm256_add_epi16) overflows in the column-pass final butterfly:
8 × 5056 = 40,448 > i16::MAX (32,767). Wrapping causes sign flips that invert entire
8×8 blocks.
mozjpeg-rs status: Production paths use i32 intermediates — immune.
The experimental SimdOps::avx2_i16() path (src/dct.rs:1326) reproduces the bug.
Use Encoder::simd_ops() to select it for testing.
Trigger conditions:
- Deringing enabled + i16 SIMD DCT path
- Vertical half-black/half-white edge within an 8×8 block
- Quality ≤ Q57 (DC quant value ≥ 14)
Horizontal splits do NOT trigger because each row is uniform (DC-only, zero AC energy).
Test patterns: imazen/codec-corpus at imageflow/test_inputs/dct_overflow_patterns/
left_black_right_white.png— 64×64, triggers overflowleft_white_right_black.png— 64×64, triggers overflowsingle_8x8_half.png— 8×8 minimal reproducertop_black_bottom_white.png— 64×64, does NOT trigger (control)checkerboard_8x8.png— 64×64, does NOT trigger (control)
Regression tests: tests/encode_tests.rs:1328 (test_issue444_deringing_overflow_pattern)
and tests/encode_tests.rs:1387 (test_issue444_across_quality_range).
Reference: mozilla/mozjpeg#453
- Huffman tree construction: Use sentinel values carefully to avoid overflow
FREQ_INITIAL_MAX = 1_000_000_000for comparisonFREQ_MERGED = 1_000_000_001for merged nodes
- Bitstream stuffing: 0xFF bytes ALWAYS require 0x00 stuffing in entropy data
- Bit buffer: Use 64-bit buffer, flush when full, pad with 1-bits at end
- Progressive encoding:
- DC scans can be interleaved (multiple components)
- AC scans must be single-component
- Successive approximation uses Ah/Al for bit refinement
- AVX2/SIMD intrinsic element ordering (CRITICAL):
_mm256_set_epi16(e15, e14, ..., e1, e0)takes arguments in REVERSE order (highest element first)- NASM
times 4 dw A, Bcreates[A,B,A,B,...]with A at even indices - To match NASM layout, use
_mm256_set_epi16(..., B, A, B, A)(swap pairs) - For
vpmaddwd:result[i] = a[2i] * b[2i] + a[2i+1] * b[2i+1] - If data is
[R, G]and you wantR*coef_R + G*coef_G, coefficients must be[coef_R, coef_G]in memory - See
dct.rs:1384for documented example
- Use
#[cfg(test)]modules within each source file - FFI validation tests in
tests/ffi_validation.rs - Test both positive cases and error conditions
mozjpeg-rs/ # Repository root IS the main crate
├── src/
│ ├── lib.rs # Module exports, public API
│ ├── consts.rs # Layer 0: Constants, tables, markers
│ ├── types.rs # Layer 0: ColorSpace, ScanInfo, etc.
│ ├── error.rs # Error types
│ ├── quant.rs # Layer 1: Quantization tables
│ ├── huffman.rs # Layer 1: Huffman table construction
│ ├── dct.rs # Layer 2: Forward DCT (Loeffler)
│ ├── color.rs # Layer 2: RGB→YCbCr conversion
│ ├── sample.rs # Layer 2: Chroma subsampling
│ ├── deringing.rs # Layer 2: Overshoot deringing (pre-DCT)
│ ├── bitstream.rs # Layer 3: Bit-level I/O
│ ├── entropy.rs # Layer 4: Huffman encoding
│ ├── trellis.rs # Layer 4: Trellis quantization
│ ├── progressive.rs # Layer 5: Progressive scans
│ ├── marker.rs # Layer 6: JPEG markers
│ ├── encode.rs # Layer 6: Encoder pipeline
│ └── test_encoder.rs # Unified test API for Rust vs C comparison
├── tests/
│ ├── ffi_validation.rs # crates.io mozjpeg-sys tests
│ └── ffi_comparison.rs # Local FFI granular comparison
├── examples/
│ └── pareto_benchmark.rs # Benchmark vs C mozjpeg
├── crates/
│ └── sys-local/ # Local FFI bindings (builds from ../mozjpeg)
│ ├── build.rs # CMake integration
│ └── src/lib.rs # FFI declarations + test exports
├── benchmark/ # Reproducible benchmark infrastructure
│ ├── Dockerfile
│ └── plot_pareto.py
└── ../mozjpeg/ # Instrumented C mozjpeg fork (external)
├── mozjpeg_test_exports.c # Test export implementations
└── mozjpeg_test_exports.h # Test export declarations
For comparing Rust vs C implementations with guaranteed parameter parity:
use mozjpeg_rs::test_encoder::{TestEncoderConfig, encode_rust};
// Configuration shared between both implementations
let config = TestEncoderConfig {
quality: 85,
subsampling: Subsampling::S420,
progressive: false,
optimize_huffman: false,
trellis_quant: false,
trellis_dc: false,
overshoot_deringing: false,
};
// Encode with Rust
let rust_jpeg = encode_rust(&rgb, width, height, &config);
// Encode with C (in examples, implement encode_c using mozjpeg_sys)
let c_jpeg = encode_c(&rgb, width, height, &config);This ensures apples-to-apples comparison by using identical settings.
cargo test # Run all tests
cargo test huffman # Run specific module tests
cargo test --test ffi_validation # Run crates.io FFI tests
cargo test --test ffi_comparison # Run local FFI comparison tests
cargo test -p sys-local # Run sys-local testsFor extended testing with real images:
# Fetch Kodak test images (~15MB)
./scripts/fetch-corpus.sh
# Fetch full corpus including CLIC (~100MB)
./scripts/fetch-corpus.sh --full
# Or set environment variable to use existing corpus
export CODEC_CORPUS_DIR=/path/to/codec-corpusThe corpus utilities in mozjpeg_rs::corpus handle path resolution:
- Checks
MOZJPEG_CORPUS_DIRandCODEC_CORPUS_DIRenvironment variables - Falls back to
./corpus/in project root - Bundled test images always available at
tests/images/
GitHub Actions workflow runs on push/PR:
- Tests on Linux, macOS, Windows
- Unit tests, codec comparison tests, FFI validation tests
- Excludes
ffi_comparisontests (require local mozjpeg C source)
mozjpeg-sys = "2.2"(dev) - FFI validation against C mozjpegsys-local(incrates/) - Local FFI with granular test exportsbytemuck = "1.14"- Safe transmutes (for future SIMD)dssim,ssimulacra2(dev) - Perceptual quality metricscodec-eval(dev) - From https://github.com/imazen/codec-comparison
-
fast-yuv(default feature) - Enables theyuvcrate for optional faster color conversion via.fast_color(true). The yuv crate supports AVX-512, AVX2, SSE, NEON, and WASM SIMD but has ±1 rounding differences from C mozjpeg.Note: The encoder defaults to exact C parity (
fast_color(false)). Use.fast_color(true)for ~40% faster color conversion when exact parity isn't needed. -
mozjpeg-sys-config- Encode using C mozjpeg with RustEncodersettings. AddsEncoder::to_c_mozjpeg()which returns aCMozjpegencoder. Usesmozjpeg-sysfrom crates.io.let jpeg = encoder.to_c_mozjpeg().encode_rgb(&pixels, w, h)?;
-
simd-intrinsics- Hand-written AVX2/NEON intrinsics for ~15% better DCT performance. Without this, usesmultiversionautovectorization (safe, ~87% of intrinsics perf). -
png- Enable PNG loading in corpus module.
_instrument-c-mozjpeg-internals- Enables granular FFI tests against internal C mozjpeg functions (DCT, quantization, color conversion, etc.). Requires imazen/mozjpeg fork with test exports built locally. Usescripts/setup-instrumented-mozjpeg.shto set up.
mozjpeg-rs is an ENCODER ONLY. For decoding JPEG files, use one of:
- jpeg-decoder - Pure Rust, widely used
- zune-jpeg - Pure Rust, fast, SIMD-optimized
- mozjpeg-sys - C mozjpeg bindings (encode + decode)
- jpegli-rs - Rust port of Google's jpegli (WIP)
The decoder_roundtrip test validates that all major JPEG decoders can decode mozjpeg-rs output
and produce consistent results within expected quality bounds.
Test matrix:
- Decoders: jpeg-decoder, zune-jpeg, mozjpeg-sys (C decoder)
- Encoder configs: all 4 Presets × subsampling modes
- Quality levels: Q50, Q60, Q70, Q75, Q80, Q85, Q90, Q95
- Images: 20 test images from corpus
Validation criteria:
- All decoders must successfully decode the JPEG
- Decoded pixels must match between decoders (allowing ±1 for rounding)
- Butteraugli distance must be within expected bounds for each Q level
Expected Butteraugli bounds by quality:
| Quality | Max Butteraugli |
|---|---|
| Q50 | 3.0 |
| Q60 | 2.5 |
| Q70 | 2.0 |
| Q75 | 1.5 |
| Q80 | 1.2 |
| Q85 | 1.0 |
| Q90 | 0.7 |
| Q95 | 0.4 |
Tests are organized to handle symbol conflicts between different mozjpeg bindings:
# Core tests (no FFI)
cargo test -p mozjpeg-rs --lib
# Integration tests using mozjpeg-sys from crates.io
cargo test --test ffi_validation
cargo test --test preset_parity
cargo test --features mozjpeg-sys-config c_mozjpeg
# Decoder round-trip tests
cargo test --test decoder_roundtrip
# Instrumented C mozjpeg tests (local development only)
cargo test --test ffi_comparison --features _instrument-c-mozjpeg-internals