-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError when accessing images #1801
Comments
The PDF contained an image in PA mode: * P: 8-bit pixels, mapped to any other mode using a color palette * PA: P with alpha See #1801
Thank you for adding such a good issue description :-) I think I found & fixed the problem in #1802 . Would you mind to check if that solves the issue for you? |
I'm a bit confused as the extracted image does not look like the PDF. So my fix is likely flawed. Does somebody have an idea what the issue is? |
Glad you liked my issue description - it's the first issue I've opened :) Yes, when installing the issue-1801 branch version, I don't get the error anymore. But yes, in the output image the text "Test" is written with some strange horizontal lines. |
I think I made it work :-) Here is the image: Is it ok if I add test_img.pdf to https://github.com/py-pdf/sample-files ? |
Awesome - it works perfectly now! Thank you for your help. Sure! :) |
The PDF contained an image in PA mode: * P: 8-bit pixels, mapped to any other mode using a color palette * PA: P with alpha See #1801
The fix was just merged to |
Thank you for your help! If you want, I can add you as a contributor: https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html |
Great - looking forward to use the new version. Thanks for the offer, but no thanks. :) |
I am trying to extract text from a pdf, where I first try extracting the text using extract_text(). If that fails, I want to get the image(s) so that I can extract the text using OCR-technology.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform macOS-13.2.1-arm64-arm-64bit $ python -c "import pypdf;print(pypdf.__version__)" 3.8.0
I have also installed Pillow==9.4.0.
Code + PDF
This is a minimal, complete example that shows the issue:
The pdf used in the example is included below:
test_img.pdf
Traceback
This is the complete Traceback I see:
Is there any way the format can be determined in pypdf so we don't get the error from PIL? Or is there a way around the error?
The text was updated successfully, but these errors were encountered: