You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys, how are you? First of all thank you so much for making and mantaining this amazing library!
I want to extract all images in the pdf, but failed.
I cannot confirm whether it is inline image or not, I tried to compress the pdf ,then read the orginal instruction sequences, but I couldn't find the 'BI' tag
Environment
Which environment were you using when you encountered the problem?
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests! pdf_font_garbled.pdf
Traceback
This is the complete Traceback I see:
Traceback (most recent call last):
File "D:\code_project\try_pdf\experi\pypdf_image.py", line 15, in <module>
a=reader.pages[1].images
File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\_page.py", line 444, in images
extension, byte_stream = _xobj_to_image(x_object[obj])
File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 600, in _xobj_to_image
data = x_object_obj.get_data() # type: ignore
File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\generic\_data_structures.py", line 882, in get_data
decoded._data = decode_stream_data(self)
File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 545, in decode_stream_data
data = FlateDecode.decode(data, stream.get(SA.DECODE_PARMS))
File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 145, in decode
str_data = FlateDecode._decode_png_prediction(str_data, columns, rowlength) # type: ignore
File "D:\code_project\venv_all\try_pdf_latest\lib\site-packages\pypdf\filters.py", line 156, in _decode_png_prediction
raise PdfReadError("Image data is not rectangular")
pypdf.errors.PdfReadError: Image data is not rectangular
The text was updated successfully, but these errors were encountered:
Hints:
After comparing with pymupdf, it was found that pymupdf fills non-rectangular images with a black background.
eg. if a portrait in a PDF is an oval shape, pymupdf will fill it with a black background
MartinThoma
added
the
is-bug
From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
label
Apr 14, 2023
pubpub-zz
added a commit
to pubpub-zz/pypdf
that referenced
this issue
May 6, 2023
Take the number of colors into account for PNG images
Properly process the mask for transparency:
Fixes#1787Fixes#1599
Adds support for inline images extraction:
Fixes#1368Fixes#1863
Additional changes:
* Process TIFF predictor 2
* Upgrades Pillow requirement to version 9.5 for Python 9.11
Hi guys, how are you? First of all thank you so much for making and mantaining this amazing library!
I want to extract all images in the pdf, but failed.
I cannot confirm whether it is inline image or not, I tried to compress the pdf ,then read the orginal instruction sequences, but I couldn't find the 'BI' tag
Environment
Which environment were you using when you encountered the problem?
$ python -m platform Windows-10-10.0.19041-SP0 $ python -c "import pypdf;print(pypdf.__version__)" 3.6.0
Code + PDF
This is a minimal, complete example that shows the issue:
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
pdf_font_garbled.pdf
Traceback
This is the complete Traceback I see:
The text was updated successfully, but these errors were encountered: