Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] sphinx.config.is_serializable is not safe against circular references #11752

Closed
munozejm opened this issue Nov 14, 2023 · 2 comments · Fixed by #12196
Closed

[bug] sphinx.config.is_serializable is not safe against circular references #11752

munozejm opened this issue Nov 14, 2023 · 2 comments · Fixed by #12196

Comments

@munozejm
Copy link

munozejm commented Nov 14, 2023

Describe the bug

Sphinx Version: 7.1.2

We are running into a RecursionError that impedes Sphinx's ability to process documentation content and generate output. That error is encountered during execution of the is_serializable function defined in sphinx/config.py.

Given that our documentation suite is very large and that each of our documents can be hundreds of pages long, we require flexibility in defining our documents' contents. We employ a combination of .yml (key:value pairs) and .j2 (content templates) files to build the content that is fed to Sphinx for pickling and generating output.

Since we support several different system configurations, we must vary the number of and data for each of the Jinja contexts we define for building the content that is fed to Sphinx. In rudimentary terms, a main context is composed of its core block of data plus two lists: a) contexts_list which contains the data from sibling contexts, and b) additional_contexts_list which contains the data from children contexts. We came up with that scheme to allow all contexts to be visible at the same time and be able to collect data from all of them when building a document. We have been using this approach for the past 6 years while using older versions of Sphinx.

It is when encountering either our contexts_list or additional_contexts_list that the RecursionError is produced. Forcing Sphinx 7.1.2 to bypass the is_serializable function or running the exact same data with an older Sphhinx version that does not have that check/function allows Sphinx to successfully generate the .tex and .pdf files we are after.

Each sibling or child context block of data can be estimated at below 10MB and above 1MB.

Can is_serializable be modified to handle the approach from above? ALTERNATIVELY, does Sphinx provide a way to access data from another context when working within one context?

How to Reproduce

I cannot copy-paste the details or amount of technical data that makes up each environment within either the contexts_list or the additional_contexts_list in here. The proprietary nature and volume of the data tie my hands.

As questions come up, I can work with someone to answer them.

The high-level command we issue is "sphinx-build -b latex -d build/doctrees source build/latex".

Environment Information

We are running Gitlab with a runner built using the "sphinxdoc/sphinx-latexpdf" image from Docker Hub.  When the StopIteration issue is encountered and Sphinx dies, Gitlab automatically performs cleanup and removes the container (i.e., the runner).  Attempting to execute "sphinx-build --bug-report" does not pan out in that situation.

Below is an abbreviated stack trace of the error.

pickling environment... failed
[app] emitting event: 'build-finished'(RecursionError('maximum recursion depth exceeded in __instancecheck__'),)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sphinx/cmd/build.py", line 290, in build_main
    app.build(args.force_all, args.filenames)
  File "/usr/local/lib/python3.11/site-packages/sphinx/application.py", line 351, in build
    self.builder.build_update()
  File "/usr/local/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 287, in build_update
    self.build(['__all__'], to_build)
  File "/usr/local/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 327, in build
    pickle.dump(self.env, f, pickle.HIGHEST_PROTOCOL)
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 323, in __getstate__
    if key.startswith('_') or not is_serializable(value):
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in is_serializable
    return all(is_serializable(i) for i in obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in <genexpr>
    return all(is_serializable(i) for i in obj)
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in is_serializable
    return all(is_serializable(i) for i in obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in <genexpr>
    return all(is_serializable(i) for i in obj)
               ^^^^^^^^^^^^^^^^^^


< ...line 47 and line 50 continue to be called a very large number of times... >


  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in <genexpr>
    return all(is_serializable(i) for i in obj)
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 43, in is_serializable
    if isinstance(obj, UNSERIALIZABLE_TYPES):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded in __instancecheck__
Recursion error:
maximum recursion depth exceeded in __instancecheck__
This can happen with very large or deeply nested source files. You can carefully increase the default Python recursion limit of 1000 in conf.py with e.g.:
    import sys; sys.setrecursionlimit(1500)

Sphinx extensions

We are running Gitlab with a runner built using the "sphinxdoc/sphinx-latexpdf" image from Docker Hub.  When the StopIteration issue is encountered and Sphinx dies, Gitlab automatically performs cleanup and removes the container (i.e., the runner).  Attempting to execute "sphinx-build --bug-report" does not pan out in that situation.


On the "sphinxdoc/sphinx-latexpdf" image's Debian OS, we perform an update, an upgrade, and add packages.

  - apt-get update
  - apt-get upgrade -y
  - apt-get install -y git
  - apt-get install -y unzip
  - apt-get install -y wget
  - apt-get install -y texlive
  - apt-get install -y texlive-bibtex-extra
  - apt-get install -y texlive-font-utils
  - apt-get install -y texlive-lang-english
  - kpsewhich -var-value=TEXMFLOCAL
  - kpsewhich -var-value=TEXMFDIST
  - unzip -qo acrotex.zip
  - cd acrotex
  - |+
    for acrofile in $(ls -1 *.ins | egrep -v 'exerquiz|acrotex')
    do
      latex $acrofile
    done
    mkdir -p /usr/share/texlive/texmf-dist/tex/latex/acrotex
    cp *.sty /usr/share/texlive/texmf-dist/tex/latex/acrotex
    cp *.cfg /usr/share/texlive/texmf-dist/tex/latex/acrotex
    cp *.def /usr/share/texlive/texmf-dist/tex/latex/acrotex
  - cd ..
  - mktexlsr /usr/share/texlive/texmf-dist
  - rm -rf acrotex*

We also need to pip install a number of Python packages that are required for our documentation to be generated.

  - python -m pip install sphinx-autobuild
  - python -m pip install sphinx-git
  - python -m pip install sphinxcontrib-actdiag
  - python -m pip install sphinxcontrib-ansibleautodoc
  - python -m pip install sphinxcontrib-autoprogram
  - python -m pip install sphinxcontrib-blockdiag
  - python -m pip install sphinxcontrib-confluencebuilder
  - python -m pip install sphinxcontrib-jsonschema
  - python -m pip install sphinxcontrib-jupyter
  - python -m pip install sphinxcontrib-nwdiag
  - python -m pip install sphinxcontrib-plantuml
  - python -m pip install sphinxcontrib-seqdiag
  - python -m pip install sphinxcontrib-websupport
  - python -m pip install ciscoconfparse
  - python -m pip install decorator
  - python -m pip install enum34
  - python -m pip install funcparserlib
  - python -m pip install gitdb
  - python -m pip install Jinja2==3.0.3
  - python -m pip install jupyter-core
  - python -m pip install netaddr
  - python -m pip install plantuml
  - python -m pip install python-dateutil
  - python -m pip install pyyaml
  - python -m pip install sets
  - python -m pip install tablib

Additional context

No response

@picnixz
Copy link
Member

picnixz commented Dec 24, 2023

The issue is likely because of circular references since you are sharing everything with everyone. And you likely have references to things that should not.

I have two ideas:

  • verify your references and only include whatever needs to be included; don't share more what should be needed. I think your environment has too much circular references. So, before implementing a solution on our side, check that your objects do not have circular references, and if they do have, I would suggest coming up with an alternative.

  • we protect is_serializable against self-recursions by detecting whether we are comparing things that were already being compared. However, we cannot protect against this:

    d = {'a': {'b': 2}}
    d['a'] = d

    With that structure, there is no way we can actually check 'b' (and actually you'll never be able to get the value of 'b' itself so it's not really an issue).

For now, I'd say it's better you stick with an older Sphinx version but you should carefully decide whether you need circular references or not (but since it worked for the past years you'll likely not change). I can come up with a fix for handling non-pathological cases because it's just using the same idea as reprlib.recursive_repr.

@picnixz picnixz changed the title is_serializable function from sphinx/config.py cannot handle large volume of embedded list/dictionary data [bug] sphinx.config.is_serializable is not safe against circular references Dec 24, 2023
@picnixz
Copy link
Member

picnixz commented Mar 25, 2024

While #12196 does not directly fix it, I'm closing the issue since I don't know whether the recursion error occurred inside the __instancecheck__ (in which case, I cannot do anything and it's mostly an issue with a custom type, and I need MWE for that) or not.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants