-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File causes loop method call between functions extract_xform_text
and _extract_text
#966
Comments
@VBobCat |
I think https://corpora.tika.apache.org/base/docs/govdocs1/998/998167.pdf might fall into the same issue:
|
https://corpora.tika.apache.org/base/docs/govdocs1/998/998167.pdf |
in this PR as the test file needs the other fixes (but not linked with loop issue)
this should be closed by #969 (https://github.com/py-pdf/PyPDF2/releases/tag/2.2.0) |
I'm rather certain that the issue was solved. Please let us know if that is not the case! |
While reading a certain file, my program exits without any exception being raised.
I investigated the issue and it seems the cause is functions
extract_xform_text
and_extract_text
in _page.py call each other in a neverending loop.Environment
Which environment were you using when you encountered the problem?
Code
This is a minimal, complete example that shows the issue:
My code (that uses PyPDF2) is this:
I put a breakpoint in
extract_xform_text
and it receives these three parameters (self
,xform
,space_width
):PDF
I am sorry but I'm unable to share the very PDF file, because it contains sensitive information.
The text was updated successfully, but these errors were encountered: