Summary
resolve_indexed_palette() currently derives base_fmt by calling color_space_to_pixel_format(&base_cs), which maps several non-device color spaces (Lab, CalRGB, CalGray, and many ICCBased variants) to PixelFormat::RGB. expand_indexed_to_rgb() then reinterprets the raw palette bytes as already-RGB without running any colorimetric conversion, producing an output image whose colors are wrong in perceptually-uniform spaces but look "roughly right" because the byte layout happens to line up.
This is a real enhancement surfaced by Copilot review on #312. The common case (DeviceRGB / DeviceGray / DeviceCMYK base, which is what real-world PDFs actually use) works correctly — the #311 fix is strictly better than v0.3.23 on every file in the test corpus. This ticket tracks the follow-up work to handle the other base color spaces correctly.
Affected base color spaces
Per PDF 32000-1:2008 §8.6.6.3, the base of an Indexed color space can be any of:
| base |
palette encoding |
status after #311 |
DeviceGray |
1 byte / palette entry |
✅ correct |
DeviceRGB |
3 bytes / palette entry |
✅ correct |
DeviceCMYK |
4 bytes / palette entry |
✅ correct (via cmyk_to_rgb converter) |
CalGray |
1 byte; A component + gamma |
❌ treated as DeviceGray, no gamma correction |
CalRGB |
3 bytes; A/B/C components + gamma/matrix |
❌ treated as DeviceRGB, no calibration |
Lab |
3 bytes; Lab* |
❌ treated as RGB, wildly wrong colors |
ICCBased |
N bytes matching ICC profile |
❌ treated by its declared /Alternate channel count; no profile application |
DeviceN / Separation |
1+ bytes; requires tint transform |
❌ falls through to whatever color_space_to_pixel_format returns |
Proposed fix
Two passes depending on effort budget:
Phase 1: correctness on the Cal* / Lab paths
For CalGray, CalRGB, and Lab, the palette bytes are component values in a defined range. The spec gives exact conversion formulas (§8.6.5.3, §8.6.5.4). Implement them in a new cal_palette_to_rgb() / lab_palette_to_rgb() helper and dispatch from resolve_indexed_palette() based on base_cs. Colors become correct without touching the expander.
Phase 2: ICCBased
ICCBased is harder — a proper implementation applies the embedded ICC profile via lcms2-rs or qcms. Phase 2a is "fall back to the /Alternate color space" (§8.6.5.5), which typically gives DeviceRGB or DeviceCMYK and is already handled by phase 1. Phase 2b is full profile application.
Phase 3: DeviceN / Separation
Tint transforms are PostScript functions embedded in the PDF. Evaluating them requires a Function object parser. Pragmatic fallback: treat DeviceN as its /Alternate.
Acceptance criteria
Related
Summary
resolve_indexed_palette()currently derivesbase_fmtby callingcolor_space_to_pixel_format(&base_cs), which maps several non-device color spaces (Lab,CalRGB,CalGray, and manyICCBasedvariants) toPixelFormat::RGB.expand_indexed_to_rgb()then reinterprets the raw palette bytes as already-RGB without running any colorimetric conversion, producing an output image whose colors are wrong in perceptually-uniform spaces but look "roughly right" because the byte layout happens to line up.This is a real enhancement surfaced by Copilot review on #312. The common case (DeviceRGB / DeviceGray / DeviceCMYK base, which is what real-world PDFs actually use) works correctly — the #311 fix is strictly better than v0.3.23 on every file in the test corpus. This ticket tracks the follow-up work to handle the other base color spaces correctly.
Affected base color spaces
Per PDF 32000-1:2008 §8.6.6.3, the base of an Indexed color space can be any of:
DeviceGrayDeviceRGBDeviceCMYKcmyk_to_rgbconverter)CalGrayCalRGBLabICCBased/Alternatechannel count; no profile applicationDeviceN/Separationcolor_space_to_pixel_formatreturnsProposed fix
Two passes depending on effort budget:
Phase 1: correctness on the Cal* / Lab paths
For
CalGray,CalRGB, andLab, the palette bytes are component values in a defined range. The spec gives exact conversion formulas (§8.6.5.3, §8.6.5.4). Implement them in a newcal_palette_to_rgb()/lab_palette_to_rgb()helper and dispatch fromresolve_indexed_palette()based onbase_cs. Colors become correct without touching the expander.Phase 2: ICCBased
ICCBased is harder — a proper implementation applies the embedded ICC profile via
lcms2-rsorqcms. Phase 2a is "fall back to the/Alternatecolor space" (§8.6.5.5), which typically gives DeviceRGB or DeviceCMYK and is already handled by phase 1. Phase 2b is full profile application.Phase 3: DeviceN / Separation
Tint transforms are PostScript functions embedded in the PDF. Evaluating them requires a Function object parser. Pragmatic fallback: treat DeviceN as its
/Alternate.Acceptance criteria
resolve_indexed_palette()no longer blindly routes throughcolor_space_to_pixel_format()for non-Device bases./Alternaterather than guessing.Related