-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix Parsing of Inline Images #332
Conversation
…whitespace before and after it, not just after it.
#331 is also implements protection against incorrect images. Also make parsing of inline images a lot faster. |
…whitespace before and after it, not just after it.
The current solution is not compatible with the recent BytesIO implementation. Do you mind to adjust your PR? |
I fixed the merge conflict, I'm not sure what you're referring to re |
@speedplane We made some pretty heavy changes to PyPDF2 recently. If you search for Do you have an example PDF where this adjustment is necessary? Does it close one of the open issues? |
It would help me a lot if we had an image that shows the described issue. |
Sorry, this is all I have. I can't remember what this fixed or how it fixes it. |
@speedplane The issue you addressed was fixed via #1327 . May I add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ? Your PR was not merged, but you did make a valuable contribution with this PR. It was just me not being able to understand it at the time. |
The inline image parser does not look for whitespace before the
EI
keyword as it should. Thus if you have a content stream as follows, the parser would crash:Notice the
EI
on one line and theQ
on the following line occurs in two places. To properly check, we need to make sure the EI is preceded by white-space.Also, added a protection against infinite loops in case the PDF is corrupt and the inline image never ends.