mozjpeg-rs Development Guide

Safety & Performance

100% Safe Rust - The core library uses #![deny(unsafe_code)] with no exceptions:

All SIMD uses archmage 0.5 with #[arcane] macro for safe target_feature dispatch
Memory operations use safe_unaligned_simd 0.2.4 for unaligned loads/stores
Zero performance loss vs unsafe intrinsics (archmage token caching eliminates dispatch overhead)
Only FFI modules (compat.rs, test_encoder.rs) opt-in to unsafe for C mozjpeg interop

Development Guidelines

Never disable features to achieve parity

BANNED SOLUTIONS:

Disabling features in max_compression() to hide parity gaps
Changing defaults to avoid broken code paths
Commenting out features that don't work correctly

REQUIRED APPROACH:

If a feature produces wrong output, FIX THE FEATURE
Document gaps honestly in README, but keep features enabled

Never relax test tolerances

If tests fail, find and fix the bug. Never:

Increase allowed difference thresholds
Skip failing tests
Mark tests as #[ignore] to make CI green

Resolved Issues

AC refinement correction bits tracking (FIXED Jan 2025):

Root cause: count_ac_refine() didn't track correction bits accumulation, causing DHT to miss intermediate EOB symbols when encoder flushed early due to buffer overflow
Fix: Added correction_bits_count field to ProgressiveSymbolCounter and matching flush threshold (937 bytes) to count_ac_refine()
Result: ProgressiveBalanced Q90+ now decodes correctly with all decoders (GitHub #2)

AC refinement encoding (FIXED Dec 2024):

Root cause: ZRL loop only ran for newly-nonzero coefficients, not previously-coded ones
Fix: Restructured encode_ac_refine() to match C mozjpeg's loop structure exactly
Result: All progressive modes with successive approximation now work correctly

optimize_scans producing larger files (FIXED Dec 2024):

Root cause: Trial encoder used per-scan Huffman tables, actual encoder used global tables
Fix: Actual encoder now uses per-scan AC Huffman tables when optimize_scans=true
Result: optimize_scans now produces smaller files at all quality levels

Scan optimizer algorithm bugs (FIXED Feb 2026):

Root cause 1: DC successive approximation was applied but C mozjpeg JCP_MAX_COMPRESSION doesn't use it
Root cause 2: Frequency split decisions made at Al=0 were incorrectly applied at Al>0
Root cause 3: Frequency split cost wasn't compared against SA cost before selecting SA level
Fix: Removed DC SA, apply freq split only at Al=0, compare freq split vs SA costs for both luma and chroma
Result: Max Compression mode now within ±0.7% average, some images smaller than C (better optimization)

Project Overview

Rust port of Mozilla's mozjpeg JPEG encoder, following the jpegli-rs methodology.

Scope: Encoder only + mozjpeg extensions (trellis, progressive scans, deringing) API: Idiomatic Rust (not C-compatible) Validation: FFI dual-execution against C mozjpeg

Current Status

Tests passing: 164 unit + 8 codec comparison + 5 FFI validation

Compression Results vs C mozjpeg

Kodak corpus (24 images), 4:2:0, exact color match (default). 6 configs × 6 quality levels. Reproduce: cargo test --release --test parity_benchmark -- --nocapture

Config	Q	Delta	Max Dev
Baseline	55	+0.00%	0.00%
Baseline	65	+0.00%	0.00%
Baseline	75	+0.00%	0.00%
Baseline	85	+0.00%	0.00%
Baseline	90	+0.00%	0.00%
Baseline	95	+0.00%	0.00%
Baseline + Trellis	55	-0.76%	1.35%
Baseline + Trellis	65	-0.59%	1.22%
Baseline + Trellis	75	-0.47%	1.26%
Baseline + Trellis	85	-0.22%	0.74%
Baseline + Trellis	90	-0.12%	0.75%
Baseline + Trellis	95	-0.05%	0.64%
Full Baseline	55	-0.71%	1.25%
Full Baseline	65	-0.56%	1.18%
Full Baseline	75	-0.47%	1.24%
Full Baseline	85	-0.22%	0.73%
Full Baseline	90	-0.13%	0.73%
Full Baseline	95	-0.07%	0.64%
Progressive	55	+0.00%	0.00%
Progressive	65	+0.00%	0.00%
Progressive	75	+0.00%	0.00%
Progressive	85	+0.00%	0.00%
Progressive	90	+0.00%	0.00%
Progressive	95	+0.00%	0.00%
Progressive + Trellis	55	-0.80%	1.33%
Progressive + Trellis	65	-0.58%	1.23%
Progressive + Trellis	75	-0.41%	1.10%
Progressive + Trellis	85	-0.21%	0.76%
Progressive + Trellis	90	-0.15%	0.48%
Progressive + Trellis	95	-0.08%	0.61%
Full Progressive	55	-0.78%	1.35%
Full Progressive	65	-0.58%	1.21%
Full Progressive	75	-0.40%	1.07%
Full Progressive	85	-0.21%	0.72%
Full Progressive	90	-0.14%	0.51%
Full Progressive	95	-0.08%	0.63%
Max Compression	55	-0.72%	1.70%
Max Compression	65	-0.45%	0.86%
Max Compression	75	+0.01%	0.96%
Max Compression	85	+0.15%	1.24%
Max Compression	90	+0.21%	0.85%
Max Compression	95	+0.17%	1.15%

Configs: Baseline = huffman opt only. +Trellis = AC trellis. Full = AC trellis + DC trellis + deringing. Max Compression = Full + optimize_scans: true. All others use optimize_scans: false. All use force_baseline: true.

Key findings:

Byte-exact parity for Baseline and Progressive modes (0.00% delta) with default color conversion
With trellis, Rust produces smaller files than C (-0.05% to -0.80%)
Max Compression within ±0.72% average, max deviation 1.70%
Visual quality equivalent (SSIMULACRA2 and Butteraugli verified)

Mode explanations:

Baseline (progressive(false)): Sequential DCT
Progressive (progressive(true), optimize_scans(false)): 9-scan JCP_MAX_COMPRESSION script with successive approximation
Max Compression (Encoder::max_compression()): Progressive + optimize_scans=true with per-scan Huffman tables

Performance vs C mozjpeg (release mode, AVX2 enabled)

2048x2048 image (30 iterations): Reproduce: cargo test --release --test bench_2k -- --nocapture

Configuration	Rust (ms)	C (ms)	Ratio	Notes
Baseline (no opts)	47.88	9.65	4.96x slower	Entropy encoding bottleneck
Trellis AC+DC	202.84	217.29	0.93x faster	Rust 7% faster!

Key findings:

Trellis mode: Rust is 7% faster than C mozjpeg (at 2048x2048)
Baseline gap is ~5x - entropy encoding is the remaining bottleneck
Color conversion: c_compat default (~3.6 Gpix/s AVX2), yuv crate opt-in (~5.2 Gpix/s)

* Progressive mode note: Both Rust and C support optimize_scans which tries multiple scan configurations to find the smallest output. Progressive encoding now works correctly for all image sizes including non-MCU-aligned dimensions with subsampling.

Completed Layers

Layer 0: Constants, types, error handling
Layer 1: Quantization tables, Huffman table construction
Layer 2: Forward DCT, color conversion, chroma subsampling
Layer 3: Bitstream writer with byte stuffing
Layer 4: Entropy encoder, trellis quantization (AC + DC)
Layer 5: Progressive scan generation (integrated)
Layer 6: Marker emission, Full encoder pipeline
FFI Validation: Granular comparison tests against C mozjpeg

Working Encoder

The encoder produces valid JPEG files with mozjpeg-quality compression:

use mozjpeg_rs::{Encoder, Subsampling, TrellisConfig};

// Default encoding (trellis + Huffman optimization enabled)
let encoder = Encoder::new().quality(85);
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;

// Maximum compression (progressive + trellis + optimized Huffman)
let encoder = Encoder::max_compression();
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;

// Fastest encoding (no optimizations)
let encoder = Encoder::fastest().quality(85);
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;

// Custom configuration
let encoder = Encoder::new()
    .quality(75)
    .progressive(true)
    .subsampling(Subsampling::S420)
    .trellis(TrellisConfig::default())
    .optimize_huffman(true);
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;

// Faster color conversion (opt into yuv crate)
let encoder = Encoder::baseline_optimized()
    .quality(85)
    .fast_color(true);  // ~40% faster, ±1 rounding vs C mozjpeg
let jpeg_data = encoder.encode_rgb(&pixels, width, height)?;

Implemented Features

Baseline JPEG encoding - Standard sequential DCT
Progressive JPEG encoding - Multi-scan with DC first, then AC bands (RGB and grayscale)
Trellis quantization - Rate-distortion optimized AC + DC quantization (mozjpeg core feature)
Trellis speed optimization - Adaptive search limiting for high-entropy blocks (Q80-100)
DC trellis optimization - Dynamic programming across blocks for optimal DC encoding
Huffman table optimization - 2-pass encoding for optimal tables
Chroma subsampling - 4:4:4, 4:2:2, 4:2:0 modes
Quality presets - max_compression() and fastest()
Overshoot deringing - Reduce ringing artifacts at sharp edges (see below)
Optimize scans - Try multiple scan configurations for progressive mode, pick smallest
Grayscale progressive - Full progressive JPEG support for grayscale images
Smoothing filter - Noise reduction for dithered images (.smoothing(30))
C-compat color conversion - Exact C mozjpeg parity (default, AVX2-accelerated)
SIMD optimizations - AVX2 (x86_64) and NEON (aarch64) with archmage safe intrinsics

Remaining Work

Baseline entropy encoding - ~4.7x slower than C (trellis mode is 10% faster than C)
Arithmetic coding (optional, rarely used)

TrellisConfig Fields Not Yet Wired Up

freq_split (default 8) — Field exists, only used in progressive scan generation, not in the trellis DP loop. Could split AC trellis into low/high frequency passes. trellis_quantize_block_with_eob_info() already accepts ss/se parameters.
q_opt (default false) — Field exists but not implemented. Would optimize quant tables within trellis loop. Not implemented in C mozjpeg either (experimental/placeholder).

TrellisConfig Fields — Implemented but No-Op by Default

delta_dc_weight (default 0.0) — Wired up in dc_trellis_optimize_indexed(). When > 0.0, blends vertical DC gradient error into the distortion cost, encouraging smoother DC transitions between rows. Matches C mozjpeg trellis_delta_dc_weight (jcdctmgr.c:1069-1084). Default 0.0 matches C defaults (disabled).
use_lambda_weight_tbl — Not used. C mozjpeg hardcodes flat (1/q^2) weights regardless of this flag. Rust matches C behavior.
num_loops — Not used. Always runs one trellis pass.

Not Implemented (Poor Tradeoff)

Multipass trellis (use_scans_in_trellis) - C mozjpeg benchmarks show +0.52% larger files, imperceptible quality improvement (-0.05 butteraugli), 20% slower encoding. Not worth implementing.

Optional Features (Disabled by Default)

EOB cross-block optimization (TrellisConfig::eob_optimization(true)) - Experimental cross-block EOBRUN optimization. Disabled by default due to aggressive coefficient zeroing in some cases. Enable with TrellisConfig::default().eob_optimization(true) if needed.

Recent Fixes

AC refinement correction bits tracking (Jan 2025): Fixed bug where count_ac_refine() didn't track correction bits accumulation, causing DHT to miss intermediate EOB symbols. The encoder flushes EOBRUN when correction bits exceed 937 bytes, but the counter wasn't tracking this threshold. This caused ProgressiveBalanced Q90+ to fail with strict decoders.
AC refinement ZRL encoding (Dec 2024): Fixed bug where ZRL (zero-run-length) symbols weren't emitted for previously-coded coefficients, causing decoder errors on 8/24 images. Now uses 9-scan JCP_MAX_COMPRESSION script with full successive approximation.
Progressive AC scan block count (Dec 2024): Fixed bug where non-MCU-aligned images with subsampling produced corrupted progressive JPEGs. AC scans now correctly encode ceil(width/8) × ceil(height/8) blocks instead of MCU-padded block count.

Both baseline and progressive modes work correctly! With trellis + Huffman optimization, Rust produces files with quality matching C mozjpeg across all image sizes and subsampling modes.

Bytewise Parity Analysis (Progressive Q85, Kodak corpus)

File size comparison (24 images, 9-scan SA script, trellis disabled):

Total: C = 1,939,046 bytes, Rust = 1,939,068 bytes (+22 bytes, +0.00%)
Per-image range: -30 to +30 bytes (±0.05%)
Not a fixed offset - varies per image due to coefficient rounding

Segment comparison:

DQT (quant tables): ✅ Identical
SOF2 (frame header): ✅ Identical
DHT (Huffman tables): ✅ Identical
SOS headers: ✅ Identical
Entropy data: First 257 bytes match, then diverges

Decoded pixel comparison:

R channel: 0 differences (perfect match)
G channel: 53 pixels differ by ≤1
B channel: 122 pixels differ by ≤4

The entropy divergence is caused by minor coefficient rounding differences that cascade through DC differential encoding. Both produce visually identical images.

TODO / Planned Investigations

Run image delta diffs with C mozjpeg: Detect if variance in compression is due to a few bad blocks or distributed across all blocks. This will help identify if the size gap is from systematic coefficient differences or localized issues.

Known Issues / Active Investigations

File Size Gap with optimize_scans - FIXED ✅ (Feb 2026)

Original symptom: Rust produced ~1-4% larger files with optimize_scans at low quality levels (Q10-Q50), with kodim23 showing +3.37% at Q40.

Root cause: encode_rust() in test_encoder.rs was not passing optimize_scans to the Encoder builder chain. The Rust scan optimizer was never called — the encoder always used the fixed 9-scan script regardless of the optimize_scans flag. The C encoder correctly used its scan search to find simpler scripts (4-5 scans, no SA) at low quality.

Fix: Added .optimize_scans(config.optimize_scans) to encode_rust() builder chain in src/test_encoder.rs.

Result: Max Compression within ±0.4% average at all quality levels. At low quality (Q10-Q50) Rust is now smaller than C. Per-image max deviation ~1.6% (from different local optima in scan search, not a bug).

Previous fixes (Dec 2025): ScanTrialEncoder sequential encoding + per-scan Huffman. Previous note (Feb 2025): C test harness optimize_scans control.

AC Refinement Decoder Errors - FIXED ✅ (Dec 2024)

Symptom: 8/24 Kodak images failed to decode with "failed to decode huffman code" when using progressive mode with successive approximation refinement scans.

Root Cause: In encode_ac_refine() and count_ac_refine(), the ZRL (zero-run-length) loop was only inside the "temp == 1" (newly nonzero) branch, but it needs to run for BOTH temp > 1 (previously coded) AND temp == 1 cases. When a long run of zeros preceded a previously coded coefficient, ZRL symbols weren't emitted, corrupting the bitstream.

Fix: Restructured the encoding loop to match C mozjpeg's jcphuff.c exactly:

Pre-pass to compute absvalues[] and find eob (last newly-nonzero position)
ZRL loop now runs for all non-zero coefficients before the temp > 1 vs temp == 1 check
Added k <= eob condition to prevent ZRL emission after the last newly-nonzero

Status: All 24 Kodak images decode successfully with 9-scan JCP_MAX_COMPRESSION script. Successive approximation (al_max_luma=3, al_max_chroma=2) is now fully enabled.

Rust vs C Pixel Difference - RESOLVED ✅

Previously reported "max diff ~11" was due to comparing different encoding modes:

C mozjpeg defaults to JCP_MAX_COMPRESSION profile which enables progressive mode
Rust was using baseline mode in comparisons

Investigation findings (Dec 2024):

Component	Match Status	Notes
DCT	✅ Exact	FFI test passes
Quantization	✅ Exact	FFI test passes
Trellis	✅ Exact	FFI test passes (new Dec 2024)
Color conversion	✅ ±1	Rounding variance
Downsampling	✅ Exact	FFI test passes
Quant tables	✅ Identical	Verified in JPEG output
Huffman tables	✅ Identical	With `optimize_huffman=true`
Full encoder	✅ 0 diff	When comparing same mode

Key findings:

With truly identical settings (baseline + Huffman opt), 0 pixel difference
Trellis quantization produces identical coefficient decisions
DC clamping to 1023 now matches C behavior
File size gap at high quality due to progressive scan structure (see above)

Decoder Chroma Upsampling Variance - DOCUMENTED ✅

When testing decoder round-trips with very small images using chroma subsampling, we observed large pixel differences (up to 41) between decoders. Investigation revealed this is expected decoder behavior, not an encoder bug.

The exact boundary: chroma_width == 2 (luma width 3-4 with horizontal subsampling)

Luma Width	Chroma Width	Rust Decoders vs mozjpeg	Notes
1-2	1	≤4 diff	Single sample, all agree
3-4	2	24-41 diff	Triangle vs simple upsampling
5+	3+	0 diff	Enough samples, all agree

Root cause: Different chroma upsampling algorithms:

jpeg-decoder & zune-jpeg: Simple replication/linear interpolation
mozjpeg: "Fancy" triangle filter (optimized for perceptual quality)

Surprisingly, Rust decoders are closer to the original image at the boundary:

Case	Rust mean error	mozjpeg mean error	Winner
3×4 4:2:2 (chroma 2×4)	11.6	12.1	Rust by 0.4
4×4 4:2:0 (chroma 2×2)	12.3	18.0	Rust by 5.7
5×4 4:2:2 (chroma 3×4)	4.7	4.7	Tie (identical)

The mozjpeg triangle filter is designed for normal images where smooth interpolation improves perceptual quality. But with only 2 chroma samples, the filter introduces more deviation from the original than simple replication.

Test coverage: decoder_roundtrip.rs tests 2496 combinations (26 dimensions × 4 presets × 3 subsamplings × 8 qualities) with appropriate tolerance for this boundary.

Analysis tool: examples/decoder_chroma_analysis.rs shows pixel-by-pixel comparison.

Workflow Rules

Commit Strategy

Commit when new tests pass - After fixing/completing a module
Commit when new tests are added - Even if they're failing (documents expected behavior)
Write descriptive commit messages explaining what was ported

Validation Approach

Validate equivalence layer by layer, not just end-to-end
Use mozjpeg-sys from crates.io for basic FFI validation
Use sys-local (in crates/) for granular internal function testing
- Builds from local ../mozjpeg C source with test exports
- C code has been instrumented with mozjpeg_test_* functions
- Tests in tests/ffi_comparison.rs compare Rust vs C implementations

Golden Rule: Never Delete Instrumentation

NEVER delete tests, FFI comparisons, or instrumentation code. These are essential for:

Validating correctness against C mozjpeg
Catching regressions during development
Documenting expected behavior
Ensuring byte-exact parity with C implementation

If a test seems obsolete, comment it out with explanation rather than deleting.

Golden Rule: Never Relax Test Tolerances to Avoid Debugging

NEVER relax test tolerances or skip failing tests to make CI green. When tests fail:

Debug the root cause - Don't paper over bugs with looser thresholds
Use DSSIM, not PSNR - PSNR is unreliable for perceptual quality comparison
Validate byte/coefficient differences - Compare actual encoded values between Rust and C
Track "off-by-N" errors - Small systematic differences indicate encoder bugs

If a test is failing, the encoder has a bug. Find it and fix it.

Key Learnings

mozjpeg Specifics

Default quant tables: mozjpeg uses ImageMagick tables (index 3), not JPEG Annex K (index 0)
Quality scaling: Q50 = 100% scale factor (use tables as-is)
DCT scaling: Output is scaled by factor of 8 (sqrt(8) per dimension)
Huffman pseudo-symbol: Symbol 256 ensures no real symbol gets all-ones code

Trellis Quantization (Critical!)

The trellis algorithm requires raw DCT coefficients (scaled by 8):

Lambda calculation: lambda = 2^scale1 / (2^scale2 + block_norm) with lambda_base = 1.0
Per-coefficient weights: weight[i] = 1/quantval[i]^2
Quantization divisor: q = 8 * quantval[i] (includes DCT scaling)
Distortion: (candidate * q - original)^2 * lambda * weight
Cost: rate + distortion where rate is Huffman code size + value bits

Overshoot Deringing

Reduces visible ringing artifacts near hard edges, especially on white backgrounds. Source: jcdctmgr.c:416-550 (~130 lines). Enabled by default with Encoder::new().

Core Insight:

JPEG can encode values outside the displayable range (0-255)
Decoders clamp results to 0-255
To encode white (255), any encoded value ≥ 255 works after clamping
Hard edges create "square wave" patterns that compress poorly
By allowing values to "overshoot" above 255, we get smoother waveforms

Algorithm (applied BEFORE DCT, after level shift):

Scan block for pixels at max value (127 after level shift = 255 original)
If no max pixels or all max pixels: return unchanged

Calculate safe overshoot limit:

maxovershoot = maxsample + min(31, 2*DC_quant, (maxsample*64 - sum)/count)

For each run of max-value pixels (in zigzag order):
- Calculate slopes from neighboring pixels
- Apply Catmull-Rom spline interpolation
- Clamp peaks to maxovershoot

Before (hard edge):

Values: 50, 80, 120, 127, 127, 127, 100, 60

After (smooth curve with overshoot):

Values: 50, 80, 120, 135, 140, 138, 100, 60
                      ↑    ↑    ↑
              These overshoot 127 but clamp to 255 when decoded

When it helps:

Images with white backgrounds
Text and graphics with hard edges
Any image with saturated regions (pixels at 0 or 255)

API:

// Enabled by default
let encoder = Encoder::new();

// Explicitly enable/disable
let encoder = Encoder::new().overshoot_deringing(true);
let encoder = Encoder::fastest().overshoot_deringing(false); // fastest disables it

Deringing + 16-bit SIMD DCT Overflow (mozilla/mozjpeg#453)

Bug: Overshoot deringing pushes level-shifted samples to ±158. The ISLOW forward DCT's 16-bit SIMD path (_mm256_add_epi16) overflows in the column-pass final butterfly: 8 × 5056 = 40,448 > i16::MAX (32,767). Wrapping causes sign flips that invert entire 8×8 blocks.

mozjpeg-rs status: Production paths use i32 intermediates — immune. The experimental SimdOps::avx2_i16() path (src/dct.rs:1326) reproduces the bug. Use Encoder::simd_ops() to select it for testing.

Trigger conditions:

Deringing enabled + i16 SIMD DCT path
Vertical half-black/half-white edge within an 8×8 block
Quality ≤ Q57 (DC quant value ≥ 14)

Horizontal splits do NOT trigger because each row is uniform (DC-only, zero AC energy).

Test patterns: imazen/codec-corpus at imageflow/test_inputs/dct_overflow_patterns/

left_black_right_white.png — 64×64, triggers overflow
left_white_right_black.png — 64×64, triggers overflow
single_8x8_half.png — 8×8 minimal reproducer
top_black_bottom_white.png — 64×64, does NOT trigger (control)
checkerboard_8x8.png — 64×64, does NOT trigger (control)

Regression tests: tests/encode_tests.rs:1328 (test_issue444_deringing_overflow_pattern) and tests/encode_tests.rs:1387 (test_issue444_across_quality_range).

Reference: mozilla/mozjpeg#453

Implementation Notes

Huffman tree construction: Use sentinel values carefully to avoid overflow
- FREQ_INITIAL_MAX = 1_000_000_000 for comparison
- FREQ_MERGED = 1_000_000_001 for merged nodes
Bitstream stuffing: 0xFF bytes ALWAYS require 0x00 stuffing in entropy data
Bit buffer: Use 64-bit buffer, flush when full, pad with 1-bits at end
Progressive encoding:
- DC scans can be interleaved (multiple components)
- AC scans must be single-component
- Successive approximation uses Ah/Al for bit refinement
AVX2/SIMD intrinsic element ordering (CRITICAL):
- _mm256_set_epi16(e15, e14, ..., e1, e0) takes arguments in REVERSE order (highest element first)
- NASM times 4 dw A, B creates [A,B,A,B,...] with A at even indices
- To match NASM layout, use _mm256_set_epi16(..., B, A, B, A) (swap pairs)
- For vpmaddwd: result[i] = a[2i] * b[2i] + a[2i+1] * b[2i+1]
- If data is [R, G] and you want R*coef_R + G*coef_G, coefficients must be [coef_R, coef_G] in memory
- See dct.rs:1384 for documented example

Testing Patterns

Use #[cfg(test)] modules within each source file
FFI validation tests in tests/ffi_validation.rs
Test both positive cases and error conditions

Architecture

mozjpeg-rs/                  # Repository root IS the main crate
├── src/
│   ├── lib.rs                  # Module exports, public API
│   ├── consts.rs               # Layer 0: Constants, tables, markers
│   ├── types.rs                # Layer 0: ColorSpace, ScanInfo, etc.
│   ├── error.rs                # Error types
│   ├── quant.rs                # Layer 1: Quantization tables
│   ├── huffman.rs              # Layer 1: Huffman table construction
│   ├── dct.rs                  # Layer 2: Forward DCT (Loeffler)
│   ├── color.rs                # Layer 2: RGB→YCbCr conversion
│   ├── sample.rs               # Layer 2: Chroma subsampling
│   ├── deringing.rs            # Layer 2: Overshoot deringing (pre-DCT)
│   ├── bitstream.rs            # Layer 3: Bit-level I/O
│   ├── entropy.rs              # Layer 4: Huffman encoding
│   ├── trellis.rs              # Layer 4: Trellis quantization
│   ├── progressive.rs          # Layer 5: Progressive scans
│   ├── marker.rs               # Layer 6: JPEG markers
│   ├── encode.rs               # Layer 6: Encoder pipeline
│   └── test_encoder.rs         # Unified test API for Rust vs C comparison
├── tests/
│   ├── ffi_validation.rs       # crates.io mozjpeg-sys tests
│   └── ffi_comparison.rs       # Local FFI granular comparison
├── examples/
│   └── pareto_benchmark.rs     # Benchmark vs C mozjpeg
├── crates/
│   └── sys-local/              # Local FFI bindings (builds from ../mozjpeg)
│       ├── build.rs            # CMake integration
│       └── src/lib.rs          # FFI declarations + test exports
├── benchmark/                  # Reproducible benchmark infrastructure
│   ├── Dockerfile
│   └── plot_pareto.py
└── ../mozjpeg/                 # Instrumented C mozjpeg fork (external)
    ├── mozjpeg_test_exports.c  # Test export implementations
    └── mozjpeg_test_exports.h  # Test export declarations

Unified Test API (`test_encoder.rs`)

For comparing Rust vs C implementations with guaranteed parameter parity:

use mozjpeg_rs::test_encoder::{TestEncoderConfig, encode_rust};

// Configuration shared between both implementations
let config = TestEncoderConfig {
    quality: 85,
    subsampling: Subsampling::S420,
    progressive: false,
    optimize_huffman: false,
    trellis_quant: false,
    trellis_dc: false,
    overshoot_deringing: false,
};

// Encode with Rust
let rust_jpeg = encode_rust(&rgb, width, height, &config);

// Encode with C (in examples, implement encode_c using mozjpeg_sys)
let c_jpeg = encode_c(&rgb, width, height, &config);

This ensures apples-to-apples comparison by using identical settings.

Build & Test

cargo test                           # Run all tests
cargo test huffman                   # Run specific module tests
cargo test --test ffi_validation    # Run crates.io FFI tests
cargo test --test ffi_comparison    # Run local FFI comparison tests
cargo test -p sys-local             # Run sys-local tests

Test Corpus

For extended testing with real images:

# Fetch Kodak test images (~15MB)
./scripts/fetch-corpus.sh

# Fetch full corpus including CLIC (~100MB)
./scripts/fetch-corpus.sh --full

# Or set environment variable to use existing corpus
export CODEC_CORPUS_DIR=/path/to/codec-corpus

The corpus utilities in mozjpeg_rs::corpus handle path resolution:

Checks MOZJPEG_CORPUS_DIR and CODEC_CORPUS_DIR environment variables
Falls back to ./corpus/ in project root
Bundled test images always available at tests/images/

CI/CD

GitHub Actions workflow runs on push/PR:

Tests on Linux, macOS, Windows
Unit tests, codec comparison tests, FFI validation tests
Excludes ffi_comparison tests (require local mozjpeg C source)

Dependencies

mozjpeg-sys = "2.2" (dev) - FFI validation against C mozjpeg
sys-local (in crates/) - Local FFI with granular test exports
bytemuck = "1.14" - Safe transmutes (for future SIMD)
dssim, ssimulacra2 (dev) - Perceptual quality metrics
codec-eval (dev) - From https://github.com/imazen/codec-comparison

Feature Flags

Published Features (safe to use)

fast-yuv (default feature) - Enables the yuv crate for optional faster color conversion via .fast_color(true). The yuv crate supports AVX-512, AVX2, SSE, NEON, and WASM SIMD but has ±1 rounding differences from C mozjpeg.

Note: The encoder defaults to exact C parity (fast_color(false)). Use .fast_color(true) for ~40% faster color conversion when exact parity isn't needed.
mozjpeg-sys-config - Encode using C mozjpeg with Rust Encoder settings. Adds Encoder::to_c_mozjpeg() which returns a CMozjpeg encoder. Uses mozjpeg-sys from crates.io.
```
let jpeg = encoder.to_c_mozjpeg().encode_rgb(&pixels, w, h)?;
```
simd-intrinsics - Hand-written AVX2/NEON intrinsics for ~15% better DCT performance. Without this, uses multiversion autovectorization (safe, ~87% of intrinsics perf).
png - Enable PNG loading in corpus module.

Development-Only Features (not published)

_instrument-c-mozjpeg-internals - Enables granular FFI tests against internal C mozjpeg functions (DCT, quantization, color conversion, etc.). Requires imazen/mozjpeg fork with test exports built locally. Use scripts/setup-instrumented-mozjpeg.sh to set up.

Decoder Alternatives

mozjpeg-rs is an ENCODER ONLY. For decoding JPEG files, use one of:

jpeg-decoder - Pure Rust, widely used
zune-jpeg - Pure Rust, fast, SIMD-optimized
mozjpeg-sys - C mozjpeg bindings (encode + decode)
jpegli-rs - Rust port of Google's jpegli (WIP)

Round-Trip Decoder Validation

The decoder_roundtrip test validates that all major JPEG decoders can decode mozjpeg-rs output and produce consistent results within expected quality bounds.

Test matrix:

Decoders: jpeg-decoder, zune-jpeg, mozjpeg-sys (C decoder)
Encoder configs: all 4 Presets × subsampling modes
Quality levels: Q50, Q60, Q70, Q75, Q80, Q85, Q90, Q95
Images: 20 test images from corpus

Validation criteria:

All decoders must successfully decode the JPEG
Decoded pixels must match between decoders (allowing ±1 for rounding)
Butteraugli distance must be within expected bounds for each Q level

Expected Butteraugli bounds by quality:

Quality	Max Butteraugli
Q50	3.0
Q60	2.5
Q70	2.0
Q75	1.5
Q80	1.2
Q85	1.0
Q90	0.7
Q95	0.4

CI Test Organization

Tests are organized to handle symbol conflicts between different mozjpeg bindings:

# Core tests (no FFI)
cargo test -p mozjpeg-rs --lib

# Integration tests using mozjpeg-sys from crates.io
cargo test --test ffi_validation
cargo test --test preset_parity
cargo test --features mozjpeg-sys-config c_mozjpeg

# Decoder round-trip tests
cargo test --test decoder_roundtrip

# Instrumented C mozjpeg tests (local development only)
cargo test --test ffi_comparison --features _instrument-c-mozjpeg-internals

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

mozjpeg-rs Development Guide

Safety & Performance

Development Guidelines

Never disable features to achieve parity

Never relax test tolerances

Resolved Issues

Project Overview

Current Status

Compression Results vs C mozjpeg

Performance vs C mozjpeg (release mode, AVX2 enabled)

Completed Layers

Working Encoder

Implemented Features

Remaining Work

TrellisConfig Fields Not Yet Wired Up

TrellisConfig Fields — Implemented but No-Op by Default

Not Implemented (Poor Tradeoff)

Optional Features (Disabled by Default)

Recent Fixes

Bytewise Parity Analysis (Progressive Q85, Kodak corpus)

TODO / Planned Investigations

Known Issues / Active Investigations

File Size Gap with optimize_scans - FIXED ✅ (Feb 2026)

AC Refinement Decoder Errors - FIXED ✅ (Dec 2024)

Rust vs C Pixel Difference - RESOLVED ✅

Decoder Chroma Upsampling Variance - DOCUMENTED ✅

Workflow Rules

Commit Strategy

Validation Approach

Golden Rule: Never Delete Instrumentation

Golden Rule: Never Relax Test Tolerances to Avoid Debugging

Key Learnings

mozjpeg Specifics

Trellis Quantization (Critical!)

Overshoot Deringing

Deringing + 16-bit SIMD DCT Overflow (mozilla/mozjpeg#453)

Implementation Notes

Testing Patterns

Architecture

Unified Test API (test_encoder.rs)

Build & Test

Test Corpus

CI/CD

Dependencies

Feature Flags

Published Features (safe to use)

Development-Only Features (not published)

Decoder Alternatives

Round-Trip Decoder Validation

CI Test Organization

Unified Test API (`test_encoder.rs`)