Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Cascaded filters in image objects #1913

Merged
merged 5 commits into from
Jun 25, 2023

Conversation

pubpub-zz
Copy link
Collaborator

closes #1912
the issue is dealing with some images being encoded with 2 filters being cascaded

@codecov
Copy link

codecov bot commented Jun 24, 2023

Codecov Report

Patch coverage: 88.52% and project coverage change: -0.03 ⚠️

Comparison is base (34a9abf) 92.85% compared to head (f155d94) 92.83%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1913      +/-   ##
==========================================
- Coverage   92.85%   92.83%   -0.03%     
==========================================
  Files          34       34              
  Lines        7056     7058       +2     
  Branches     1389     1389              
==========================================
  Hits         6552     6552              
- Misses        358      359       +1     
- Partials      146      147       +1     
Impacted Files Coverage Δ
pypdf/filters.py 93.42% <88.52%> (-0.45%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pubpub-zz
Copy link
Collaborator Author

Do not have test data for the moment to improve coverage
@MartinThoma
All yours :)

pypdf/filters.py Outdated
except KeyError: # pragma: no cover
filters = x_object_obj.get(SA.FILTER, [None])
lfilters = filters[-1] if isinstance(filters, list) else filters
if lfilters == FT.FLATE_DECODE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code might be more readable if we had some extra functions. Something like this:

if lfilters == FT.FLATE_DECODE:
    img, image_format, extension = _handle_flate(size, data, mode, color_space)
elif lfilters in (FT.LZW_DECODE, FT.ASCII_85_DECODE, FT.CCITT_FAX_DECODE):
    # Code block: Maybe leave that as-is
elif lfilters == FT.JPX_DECODE:
    img, image_format, extension = _handle_flate(size, data, mode, color_space)
elif lfilters == FT.CCITT_FAX_DECODE:
    # Code block: Maybe leave that also as is
elif lfilters is None:
    # Code block: Maybe leave that also as is

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jun 24, 2023
@MartinThoma
Copy link
Member

I will have another look at it tomorrow (I want to try the extraction + inspect the files manually). But I'm very certain that I'm taking this into pypdf tomorrow + realease it :-)

@MartinThoma
Copy link
Member

MartinThoma commented Jun 25, 2023

I can confirm that the extraction to jp2 worked. I needed to follow https://askubuntu.com/a/1151781 to convert it to something I can view though xD

  • ✔️ imagemagick-CCITTFaxDecode.pdf
  • ✔️ 019-grayscale-image

imagemagick-ASCII85Decode / imagemagick-images.pdf / imagemagick-lzw.pdf

Extracting 007-imagemagick-images/imagemagick-ASCII85Decode.pdf and 007-imagemagick-images/imagemagick-images.pdf failed:

UnidentifiedImageError: cannot identify image file

However, I cannot find a version of pypdf where it worked

@MartinThoma MartinThoma merged commit bd82a56 into py-pdf:main Jun 25, 2023
@MartinThoma MartinThoma changed the title BUG : cascaded filters in image objects BUG: Cascaded filters in image objects Jul 2, 2023
@pubpub-zz pubpub-zz deleted the casc_filters_img branch September 2, 2023 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
soon PRs that are almost ready to be merged, issues that get solved pretty soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnboundLocalError: cannot access local variable 'img' where it is not associated with a value
2 participants