-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Cascaded filters in image objects #1913
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1913 +/- ##
==========================================
- Coverage 92.85% 92.83% -0.03%
==========================================
Files 34 34
Lines 7056 7058 +2
Branches 1389 1389
==========================================
Hits 6552 6552
- Misses 358 359 +1
- Partials 146 147 +1
☔ View full report in Codecov by Sentry. |
Do not have test data for the moment to improve coverage |
pypdf/filters.py
Outdated
except KeyError: # pragma: no cover | ||
filters = x_object_obj.get(SA.FILTER, [None]) | ||
lfilters = filters[-1] if isinstance(filters, list) else filters | ||
if lfilters == FT.FLATE_DECODE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code might be more readable if we had some extra functions. Something like this:
if lfilters == FT.FLATE_DECODE:
img, image_format, extension = _handle_flate(size, data, mode, color_space)
elif lfilters in (FT.LZW_DECODE, FT.ASCII_85_DECODE, FT.CCITT_FAX_DECODE):
# Code block: Maybe leave that as-is
elif lfilters == FT.JPX_DECODE:
img, image_format, extension = _handle_flate(size, data, mode, color_space)
elif lfilters == FT.CCITT_FAX_DECODE:
# Code block: Maybe leave that also as is
elif lfilters is None:
# Code block: Maybe leave that also as is
I will have another look at it tomorrow (I want to try the extraction + inspect the files manually). But I'm very certain that I'm taking this into pypdf tomorrow + realease it :-) |
I can confirm that the extraction to jp2 worked. I needed to follow https://askubuntu.com/a/1151781 to convert it to something I can view though xD
imagemagick-ASCII85Decode / imagemagick-images.pdf / imagemagick-lzw.pdfExtracting
However, I cannot find a version of pypdf where it worked |
closes #1912
the issue is dealing with some images being encoded with 2 filters being cascaded