pypdf.errors.PdfReadError: Image data is not rectangular #1787

BriskyGates · 2023-04-13T06:52:39Z

Hi guys, how are you? First of all thank you so much for making and mantaining this amazing library!
I want to extract all images in the pdf, but failed.
I cannot confirm whether it is inline image or not, I tried to compress the pdf ,then read the orginal instruction sequences, but I couldn't find the 'BI' tag

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-10.0.19041-SP0


$ python -c "import pypdf;print(pypdf.__version__)"
3.6.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader("pdf_font_garbled.pdf")
a=reader.pages[1].images

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
pdf_font_garbled.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "D:\code_project\try_pdf\experi\pypdf_image.py", line 15, in <module>
    a=reader.pages[1].images
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\_page.py", line 444, in images
    extension, byte_stream = _xobj_to_image(x_object[obj])
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 600, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\generic\_data_structures.py", line 882, in get_data
    decoded._data = decode_stream_data(self)
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 545, in decode_stream_data
    data = FlateDecode.decode(data, stream.get(SA.DECODE_PARMS))
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 145, in decode
    str_data = FlateDecode._decode_png_prediction(str_data, columns, rowlength)  # type: ignore
  File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 156, in _decode_png_prediction
    raise PdfReadError("Image data is not rectangular")
pypdf.errors.PdfReadError: Image data is not rectangular

The text was updated successfully, but these errors were encountered:

BriskyGates · 2023-04-13T07:37:48Z

Hints:
After comparing with pymupdf, it was found that pymupdf fills non-rectangular images with a black background.
eg. if a portrait in a PDF is an oval shape, pymupdf will fill it with a black background

Number of colors were not taken into account to process PNG Images also properly process mask to transparency closes py-pdf#1787

Take the number of colors into account for PNG images Properly process the mask for transparency: Fixes #1787 Fixes #1599 Adds support for inline images extraction: Fixes #1368 Fixes #1863 Additional changes: * Process TIFF predictor 2 * Upgrades Pillow requirement to version 9.5 for Python 9.11

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 14, 2023

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue May 6, 2023

BUG : fix RGB FlateEncode Images(PNG) and transparency

ca44aec

Number of colors were not taken into account to process PNG Images also properly process mask to transparency closes py-pdf#1787

pubpub-zz mentioned this issue May 6, 2023

BUG: Fix RGB FlateEncode Images(PNG) and transparency #1834

Merged

MartinThoma closed this as completed in #1834 Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pypdf.errors.PdfReadError: Image data is not rectangular #1787

pypdf.errors.PdfReadError: Image data is not rectangular #1787

BriskyGates commented Apr 13, 2023 •

edited by MartinThoma

Loading

BriskyGates commented Apr 13, 2023

pypdf.errors.PdfReadError: Image data is not rectangular #1787

pypdf.errors.PdfReadError: Image data is not rectangular #1787

Comments

BriskyGates commented Apr 13, 2023 • edited by MartinThoma Loading

Environment

Code + PDF

Traceback

BriskyGates commented Apr 13, 2023

BriskyGates commented Apr 13, 2023 •

edited by MartinThoma

Loading