Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Recursion With PyPDF 4.1.0 in xml2rfc #2508

Closed
kitterma opened this issue Mar 8, 2024 · 4 comments
Closed

Infinite Recursion With PyPDF 4.1.0 in xml2rfc #2508

kitterma opened this issue Mar 8, 2024 · 4 comments
Labels
generic The generic submodule is affected is-question Rather a question than an issue. Should usually be a Discussion instead

Comments

@kitterma
Copy link
Contributor

kitterma commented Mar 8, 2024

Replace this: What happened? What were you trying to achieve?

Environment

$ python -m platform
Linux-6.1.0-18-amd64-x86_64-with-glibc2.37

$ python -c "import pypdf;print(pypdf._debug_versions)"
4.1.0

Code + PDF

I don't have one. It's in the xml2rfc tests. It's not clear if this is an xml2rfc issue that was highlighted by a change in 4.1.0 or a pypdf regression. If needed, I can try to be more specific.

Traceback

This is an extract from the traceback I see (the line 21, line 21, line 29 patter repeats hundreds of times):

115s autopkgtest [09:16:38]: test run-pytest: [-----------------------
116s Testing with python3.12:
135s ..E..............................................
135s ======================================================================
135s ERROR: setUpClass (__main__.PdfWriterTests)
135s ----------------------------------------------------------------------
135s Traceback (most recent call last):
135s   File "/tmp/autopkgtest-lxc.gvxbgwgc/downtmp/build.QrX/src/xxx/test.py", line 496, in setUpClass
135s     cls.elements_pdfxml = xmldoc(None, bytes=elements_pdfdoc)
135s                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/xml2rfc/walkpdf.py", line 96, in xmldoc
135s     text = xmltext(filename=filename, bytes=bytes)
135s            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/xml2rfc/walkpdf.py", line 89, in xmltext
135s     obj = pyobj(filename=filename, bytes=bytes)
135s           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/xml2rfc/walkpdf.py", line 77, in pyobj
135s     d, i = walk(obj, seen)
135s            ^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/xml2rfc/walkpdf.py", line 21, in walk
135s     d, i = walk(obj[key], seen)
135s            ^^^^^^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/xml2rfc/walkpdf.py", line 21, in walk
135s     d, i = walk(obj[key], seen)
135s            ^^^^^^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/xml2rfc/walkpdf.py", line 29, in walk
The last three lines repeat until you get to the recursion limit:
135s   File "/usr/lib/python3/dist-packages/pypdf/generic/_base.py", line 301, in __getattr__
135s     return getattr(self._get_object_with_check(), name)
135s                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135s   File "/usr/lib/python3/dist-packages/pypdf/generic/_base.py", line 290, in _get_object_with_check
135s     o = self.get_object()
135s         ^^^^^^^^^^^^^^^^^
135s RecursionError: maximum recursion depth exceeded
135s 
135s ----------------------------------------------------------------------
135s Ran 48 tests in 18.098s

Related xml2rfc issue:

ietf-tools/xml2rfc#1111

@stefan6419846
Copy link
Collaborator

This is most likely related to #2464, although I am not sure why. The actual recursion issue seems to be on the xml2rfc side, PDF generation is done by WeasyPrint.

For future reference, I have attached a corresponding PDF file: file.pdf

@stefan6419846 stefan6419846 added the generic The generic submodule is affected label Mar 8, 2024
@pubpub-zz
Copy link
Collaborator

the xml2rfc code uses a "bad trick" to identify DictionnaryObject using hasattr(obj, 'keys') : with #2464, functions of referenced object by IndirectObject are directly available in it.
proposed fix in walkpdf.py:

def walk(obj, seen):
    dobj = {}                            # Direct objects
    iobj = []                            # Indirect objects
    if isinstance(obj, pypdf.generic.DictionaryObject):    #<-----
    (...)

This is not an issue in pypdf

@kitterma
Copy link
Contributor Author

kitterma commented Mar 8, 2024 via email

@kitterma
Copy link
Contributor Author

kitterma commented Mar 9, 2024

Now tested and this does resolve the issue.

@stefan6419846 stefan6419846 added the is-question Rather a question than an issue. Should usually be a Discussion instead label Mar 9, 2024
kesara added a commit to kesara/xml2rfc that referenced this issue Mar 11, 2024
This fixes the recursion issue in walkpdf introduced by PyPDF==4.1.0.

This fix is based @pubpub-zz's suggesion in
py-pdf/pypdf#2508 (comment)

Fixes ietf-tools#1111
kesara added a commit to kesara/xml2rfc that referenced this issue Mar 11, 2024
This fixes the recursion issue in walkpdf introduced by PyPDF==4.1.0.

This fix is based @pubpub-zz's suggestion in
py-pdf/pypdf#2508 (comment)

Fixes ietf-tools#1111
kesara added a commit to kesara/xml2rfc that referenced this issue Mar 11, 2024
This fixes the recursion issue in walkpdf introduced by PyPDF==4.1.0.

This fix is based @pubpub-zz's suggestion in
py-pdf/pypdf#2508 (comment)

Fixes ietf-tools#1111
kesara added a commit to ietf-tools/xml2rfc that referenced this issue Mar 11, 2024
This fixes the recursion issue in walkpdf introduced by PyPDF==4.1.0.

This fix is based @pubpub-zz's suggestion in
py-pdf/pypdf#2508 (comment)

Fixes #1111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generic The generic submodule is affected is-question Rather a question than an issue. Should usually be a Discussion instead
Projects
None yet
Development

No branches or pull requests

3 participants