Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypdf.errors.PdfReadError: Image data is not rectangular #1787

Closed
BriskyGates opened this issue Apr 13, 2023 · 1 comment · Fixed by #1834
Closed

pypdf.errors.PdfReadError: Image data is not rectangular #1787

BriskyGates opened this issue Apr 13, 2023 · 1 comment · Fixed by #1834
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@BriskyGates
Copy link

BriskyGates commented Apr 13, 2023

Hi guys, how are you? First of all thank you so much for making and mantaining this amazing library!
I want to extract all images in the pdf, but failed.
I cannot confirm whether it is inline image or not, I tried to compress the pdf ,then read the orginal instruction sequences, but I couldn't find the 'BI' tag

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-10.0.19041-SP0


$ python -c "import pypdf;print(pypdf.__version__)"
3.6.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader("pdf_font_garbled.pdf")
a=reader.pages[1].images

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
pdf_font_garbled.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "D:\code_project\try_pdf\experi\pypdf_image.py", line 15, in <module>
    a=reader.pages[1].images
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\_page.py", line 444, in images
    extension, byte_stream = _xobj_to_image(x_object[obj])
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 600, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\generic\_data_structures.py", line 882, in get_data
    decoded._data = decode_stream_data(self)
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 545, in decode_stream_data
    data = FlateDecode.decode(data, stream.get(SA.DECODE_PARMS))
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 145, in decode
    str_data = FlateDecode._decode_png_prediction(str_data, columns, rowlength)  # type: ignore
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 156, in _decode_png_prediction
    raise PdfReadError("Image data is not rectangular")
pypdf.errors.PdfReadError: Image data is not rectangular
@BriskyGates
Copy link
Author

Hints:
After comparing with pymupdf, it was found that pymupdf fills non-rectangular images with a black background.
eg. if a portrait in a PDF is an oval shape, pymupdf will fill it with a black background

image_p4_i17

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 14, 2023
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue May 6, 2023
Number of colors were not taken into account to process PNG Images

also properly process mask to transparency

closes py-pdf#1787
MartinThoma pushed a commit that referenced this issue Jun 18, 2023
Take the number of colors into account for PNG images

Properly process the mask for transparency:

Fixes #1787
Fixes #1599

Adds support for inline images extraction:

Fixes #1368
Fixes #1863

Additional changes:

* Process TIFF predictor 2
* Upgrades Pillow requirement to version 9.5 for Python 9.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants