Summary
When resolve_indexed_palette() returns Ok(None) — a recoverable "can't parse this Indexed color space" signal — the else-branch in extract_image_from_xobject() falls through to the pre-v0.3.25 code path that maps ColorSpace::Indexed → PixelFormat::RGB. The raw index stream (1 byte/px) is then reinterpreted as RGB (3 bytes/px), reproducing the exact "Invalid RGB image dimensions" failure mode that #311 was meant to eliminate on a narrower set of Indexed shapes.
This is an edge case of the #311 fix, discovered via Copilot review of #312. Not a regression against v0.3.23 — the pre-v0.3.25 code failed on every Indexed image; v0.3.25 fails only on Indexed color spaces whose lookup component isn't a String or Stream. But for those, the error message is misleading and the fallback produces garbage pixels.
Reproducer
Any PDF with an Indexed color space where the lookup element (arr[3]) is neither Object::String nor Object::Stream (e.g. Object::Array of hex bytes, an indirect reference resolved to something else, or a malformed dict). Testable synthetically via a minimal PDF generator that emits [/Indexed /DeviceRGB 255 <... array of bytes ...>].
Root cause
src/extractors/images.rs:
// resolve_indexed_palette (~line 757)
let mut palette_bytes = match &lookup_obj {
Object::String(s) => s.clone(),
Object::Stream { .. } => lookup_obj.decode_stream_data()?,
_ => return Ok(None), // <-- this path
};
Callers treat Ok(None) as "not an Indexed color space" and fall through:
// extract_image_from_xobject (~line 658)
if let Some((base_fmt, palette)) = indexed_palette.as_ref() {
// fast path
} else {
let pixel_format = color_space_to_pixel_format(&color_space);
ImageData::Raw { pixels: decoded_data, format: pixel_format }
// ^^^^^^^^^^^^^
// color_space is still ColorSpace::Indexed → maps to RGB
// but `decoded_data` is 1 byte/px (palette indices) not 3 bytes/px
}
Proposed fix
Stop tunneling two different failure modes through Ok(None). Either:
A. Tighter contract on the helper. Keep Ok(None) only for "not an Array / array length < 4 / not Indexed" (genuine "not my problem") and return Err(Error::Image("Indexed palette unresolved: <reason>")) for every shape that is an Indexed array but can't be parsed (lookup isn't String/Stream, palette empty, base color space parse failed). The caller's else-branch is then unreachable for Indexed inputs.
B. Defensive caller. Keep the helper's contract but add an explicit check before the fall-through:
if matches!(color_space, ColorSpace::Indexed) && indexed_palette.is_none() {
return Err(Error::Image(format!(
"Indexed color space present but palette could not be resolved \
(raw stream = {} bytes). PDF may be malformed or use an \
unsupported lookup encoding.",
decoded_data.len()
)));
}
Option A is cleaner long-term (no in-band signaling). Option B is a one-line fix if we want to minimize #311 churn.
Acceptance criteria
Related
Summary
When
resolve_indexed_palette()returnsOk(None)— a recoverable "can't parse this Indexed color space" signal — the else-branch inextract_image_from_xobject()falls through to the pre-v0.3.25 code path that mapsColorSpace::Indexed→PixelFormat::RGB. The raw index stream (1 byte/px) is then reinterpreted as RGB (3 bytes/px), reproducing the exact "Invalid RGB image dimensions" failure mode that #311 was meant to eliminate on a narrower set of Indexed shapes.This is an edge case of the #311 fix, discovered via Copilot review of #312. Not a regression against v0.3.23 — the pre-v0.3.25 code failed on every Indexed image; v0.3.25 fails only on Indexed color spaces whose lookup component isn't a
StringorStream. But for those, the error message is misleading and the fallback produces garbage pixels.Reproducer
Any PDF with an Indexed color space where the
lookupelement (arr[3]) is neitherObject::StringnorObject::Stream(e.g.Object::Arrayof hex bytes, an indirect reference resolved to something else, or a malformed dict). Testable synthetically via a minimal PDF generator that emits[/Indexed /DeviceRGB 255 <... array of bytes ...>].Root cause
src/extractors/images.rs:Callers treat
Ok(None)as "not an Indexed color space" and fall through:Proposed fix
Stop tunneling two different failure modes through
Ok(None). Either:A. Tighter contract on the helper. Keep
Ok(None)only for "not an Array / array length < 4 / not Indexed" (genuine "not my problem") and returnErr(Error::Image("Indexed palette unresolved: <reason>"))for every shape that is an Indexed array but can't be parsed (lookup isn't String/Stream, palette empty, base color space parse failed). The caller's else-branch is then unreachable for Indexed inputs.B. Defensive caller. Keep the helper's contract but add an explicit check before the fall-through:
Option A is cleaner long-term (no in-band signaling). Option B is a one-line fix if we want to minimize #311 churn.
Acceptance criteria
extract_image_from_xobjecton an Indexed color space whose lookup isn't a String/Stream returnsError::Imagewith a message that identifies the palette-resolution failure, not "Invalid RGB image dimensions".[/DeviceRGB 255 [[0, 0, 0], [255, 255, 255]]]shape (lookup as an Array) that currently falls through — pin the new error path.Charltsing/report.pdfstill extracts all 218 images cleanly).Related