BUG: Cascaded filters in image objects #1913

pubpub-zz · 2023-06-24T08:37:26Z

closes #1912
the issue is dealing with some images being encoded with 2 filters being cascaded

closes py-pdf#1912

codecov · 2023-06-24T08:49:56Z

Codecov Report

Patch coverage: 88.52% and project coverage change: -0.03 ⚠️

Comparison is base (34a9abf) 92.85% compared to head (f155d94) 92.83%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1913      +/-   ##
==========================================
- Coverage   92.85%   92.83%   -0.03%     
==========================================
  Files          34       34              
  Lines        7056     7058       +2     
  Branches     1389     1389              
==========================================
  Hits         6552     6552              
- Misses        358      359       +1     
- Partials      146      147       +1

Impacted Files	Coverage Δ
pypdf/filters.py	`93.42% <88.52%> (-0.45%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pubpub-zz · 2023-06-24T10:14:54Z

Do not have test data for the moment to improve coverage
@MartinThoma
All yours :)

MartinThoma · 2023-06-24T21:16:21Z

pypdf/filters.py

-                    except KeyError:  # pragma: no cover
+    filters = x_object_obj.get(SA.FILTER, [None])
+    lfilters = filters[-1] if isinstance(filters, list) else filters
+    if lfilters == FT.FLATE_DECODE:


I think the code might be more readable if we had some extra functions. Something like this:

if lfilters == FT.FLATE_DECODE: img, image_format, extension = _handle_flate(size, data, mode, color_space) elif lfilters in (FT.LZW_DECODE, FT.ASCII_85_DECODE, FT.CCITT_FAX_DECODE): # Code block: Maybe leave that as-is elif lfilters == FT.JPX_DECODE: img, image_format, extension = _handle_flate(size, data, mode, color_space) elif lfilters == FT.CCITT_FAX_DECODE: # Code block: Maybe leave that also as is elif lfilters is None: # Code block: Maybe leave that also as is

MartinThoma · 2023-06-24T21:17:24Z

I will have another look at it tomorrow (I want to try the extraction + inspect the files manually). But I'm very certain that I'm taking this into pypdf tomorrow + realease it :-)

MartinThoma · 2023-06-25T08:00:58Z

I can confirm that the extraction to jp2 worked. I needed to follow https://askubuntu.com/a/1151781 to convert it to something I can view though xD

✔️ imagemagick-CCITTFaxDecode.pdf
✔️ 019-grayscale-image

imagemagick-ASCII85Decode / imagemagick-images.pdf / imagemagick-lzw.pdf

Extracting 007-imagemagick-images/imagemagick-ASCII85Decode.pdf and 007-imagemagick-images/imagemagick-images.pdf failed:

UnidentifiedImageError: cannot identify image file

However, I cannot find a version of pypdf where it worked

BUG : cascaded filters in image objects

3231fca

closes py-pdf#1912

rodgermoore mentioned this pull request Jun 24, 2023

UnboundLocalError: cannot access local variable 'img' where it is not associated with a value #1912

Closed

MartinThoma reviewed Jun 24, 2023

View reviewed changes

MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jun 24, 2023

pubpub-zz and others added 4 commits June 25, 2023 13:38

style

c403740

style2

7ec38cd

style 3

6d3d83f

Merge branch 'main' into casc_filters_img

f155d94

MartinThoma merged commit bd82a56 into py-pdf:main Jun 25, 2023

MartinThoma changed the title ~~BUG : cascaded filters in image objects~~ BUG: Cascaded filters in image objects Jul 2, 2023

pubpub-zz deleted the casc_filters_img branch September 2, 2023 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Cascaded filters in image objects #1913

BUG: Cascaded filters in image objects #1913

pubpub-zz commented Jun 24, 2023

codecov bot commented Jun 24, 2023 •

edited

Loading

pubpub-zz commented Jun 24, 2023

MartinThoma Jun 24, 2023

MartinThoma commented Jun 24, 2023

MartinThoma commented Jun 25, 2023 •

edited

Loading

BUG: Cascaded filters in image objects #1913

BUG: Cascaded filters in image objects #1913

Conversation

pubpub-zz commented Jun 24, 2023

codecov bot commented Jun 24, 2023 • edited Loading

Codecov Report

pubpub-zz commented Jun 24, 2023

MartinThoma Jun 24, 2023

Choose a reason for hiding this comment

MartinThoma commented Jun 24, 2023

MartinThoma commented Jun 25, 2023 • edited Loading

imagemagick-ASCII85Decode / imagemagick-images.pdf / imagemagick-lzw.pdf

codecov bot commented Jun 24, 2023 •

edited

Loading

MartinThoma commented Jun 25, 2023 •

edited

Loading