Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.2.1 breaks when defusedxml is loaded #950

Closed
rcarmo opened this issue Apr 28, 2020 · 17 comments
Closed

3.2.1 breaks when defusedxml is loaded #950

rcarmo opened this issue Apr 28, 2020 · 17 comments
Labels
wontfix The issue will not be fixed for the stated reasons.

Comments

@rcarmo
Copy link

rcarmo commented Apr 28, 2020

I was upgrading one of my projects from markdown==3.1 to markdown==3.2.1, and had a very weird behaviour that was quite hard to track down.

In my test case, I had this in my imports:

from markdown import Markdown
from nbconvert import HTMLExporter

def render_markdown(raw):
    return markdown_renderer.reset().convert(raw)

def render_ipynb(raw):
    exporter = HTMLExporter()
    exporter.template_file="basic"
    return smartypants(exporter.from_notebook_node(reads(renumber_notebook(raw), 4))[0]

When I called only render_markdown, I got this pretty obscure exception:

Traceback (most recent call last):
  File "markdown_test.py", line 81, in <module>
    print(render_markdown(test_block))
  File "markdown_test.py", line 79, in render_markdown
    return markdown_renderer.reset().convert(raw)
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/core.py", line 269, in convert
    newRoot = treeprocessor.run(root)
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/extensions/footnotes.py", line 388, in run
    footnotesDiv = self.footnotes.makeFootnotesDiv(root)
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/extensions/footnotes.py", line 181, in makeFootnotesDiv
    self.parser.parseChunk(surrogate_parent, self.footnotes[id])
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/blockparser.py", line 105, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/blockparser.py", line 123, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/blockprocessors.py", line 591, in run
    p = etree.SubElement(parent, 'p')
TypeError: SubElement() argument 1 must be xml.etree.ElementTree.Element, not Element
make: *** [smoke-test] Error 1

Reverting back to markdown==3.1 "fixed" it, and it took me a long time to figure out what caused it in the first place.

As it turns out, it went away when I removed the nbconvert import or moved it inside render_ipynb (which is bad form, but turned out to be the quickest workaround). I assume that nbconvert does some kind of conflicting initialization to markdown that does not go away with reset()...

But since it wasn't happening in markdown==3.1 and happens in both still versions of nbconvert I've tried (nbconvert==5.5.0 and nbconvert==5.6.1), I'm assuming something changed in markdown to cause this issue.

@facelessuser
Copy link
Collaborator

You can checkout the diff: 3.1...3.2.1.

Right now I'd be inclined to say it is an nbconvert issue, but I know nothing about that package and what it does. As far as Python Markdown is concerned it works and passes all tests. You have a unique environment issue. If you are able to track down the issue, depending on what it is, maybe it could be addressed here, but we'd need more info.

@rcarmo
Copy link
Author

rcarmo commented Apr 28, 2020

nbconvert renders Jupyter notebooks, and as such relies itself upon Markdown - hence my suspicion that loading it causes some form of initialisation that isn't "re-entrant" in a way.

@waylan
Copy link
Member

waylan commented Apr 29, 2020

A few things immediately jump out at me when looking at your test case:

  1. It appears that markdown_renderer, smartypants, reads, and renumber_notebook are not defined or imported. I'm not familiar with nbconvert so I don't know if that library is responsible for those or if they are coming from somewhere else. In other words, I couldn't replicate your test if I wanted to.
  2. The Traceback suggests that the error is encountered when Markdown is parsing the contents of a footnote. Specifically, when attempting to create a <p> element it encounter's a TypeError. I'm curious if the error occurs if no footnotes exist in the document or if the footnotes extension is disabled. In other words, is this specific to footnotes or not?
  3. The TypeError seems to revolve around the xml.etree lib in the Python standard library and suggests that an Element instance is not an instance or subclass of xml.etree.ElementTree.Element. We changed the way we import xml.etree in Simplify xml.etree.ElementTree loading #902. My guess is that this is where the issue is coming from. Could it be that nbconvert is monkey-patching xml.etree or something?

@rcarmo
Copy link
Author

rcarmo commented Apr 29, 2020

Hi there.

Let me take these by number:

  1. I had removed the imports for those by the time I reached this point, since I had already established that imports alone were enough to break render_markdown.

So this is still a “valid” test case, as long as you only invoke render_markdown (we’re splitting hairs here, but you get the gist of things).

  1. the document did have footnotes, and the extension is in fact enabled. I had fiddled with the various configuration options and markup formatting to make sure of that well before I got to this point.

  2. I do not know what nbconvert does besides also importing markdown. All I can add is that I use lxml in the same project, and that is (naturally) not affected (I might have noticed something else in my XML processing otherwise)

@waylan
Copy link
Member

waylan commented Apr 29, 2020

  1. I had removed the imports for those by the time I reached this point, since I had already established that imports alone were enough to break render_markdown.

So this is still a “valid” test case, as long as you only invoke render_markdown (we’re splitting hairs here, but you get the gist of things).

As render_markdown only calls methods of an undefined object markdown_renderer, this is not a valid test. I have no idea if markdown_renderer is an instance of markdown.Markdown or some custom subclass (I can only assume it is one of the two based on the calls to reset().convert()). I also have no way of knowing what arguments have been used to create the Markdown instance. So, no, I am not able to replicate the test or the error.

2 the document did have footnotes, and the extension is in fact enabled. I had fiddled with the various configuration options and markup formatting to make sure of that well before I got to this point.

This doesn't answered my question: Does the error occur if no footnotes exist in the document? If so, what is the traceback (which obviously wouldn't come from within a call to the footnote parser). I'm guessing footnotes are a red herring here, but I can't be sure with the information provided so far.

  1. I do not know what nbconvert does besides also importing markdown. All I can add is that I use lxml in the same project, and that is (naturally) not affected (I might have noticed something else in my XML processing otherwise)

Given the lack of information above, I have no way of knowing if this is relevant or not.

@rcarmo
Copy link
Author

rcarmo commented Apr 29, 2020 via email

@waylan
Copy link
Member

waylan commented Apr 29, 2020

Sorry but I still have no idea what markdown_renderer is. It is not anything that comes from our library. My guess is that it is a wrapper around our library, but until I know what it does or where it comes from, I can't help you.

@rcarmo
Copy link
Author

rcarmo commented Apr 30, 2020

markdown_renderer = Markdown(extensions=[
        "markdown.extensions.extra",
        "markdown.extensions.toc",
        "markdown.extensions.smarty",
        "markdown.extensions.codehilite",
        "markdown.extensions.meta",
        "markdown.extensions.sane_lists"
    ], extension_configs = {
        "codehilite": {
            "css_class": "highlight"
        }
    },
    output_format="html5"
)

@waylan
Copy link
Member

waylan commented Apr 30, 2020

I can confirm I am seeing the same behavior.

I also searched through nbconvert's code and I discovered that they are using tiran/defusedxml in nbconvert/filters/strings.py

I was able to trim a failing example down to this most simple form:

import markdown
from defusedxml import ElementTree

src = '''
Markdown paragraph with a footnote.[^1]

[^1]: Single line footnote.
'''

markdown.markdown(src, extensions=['footnotes'])

I tried other variations and nothing else seems to make a difference one way or the other. The error always occurs so long as all three of the following are true:

  1. defusedxml is imported (or another imported library imports it).
  2. The document contains at least one footnote.
  3. Markdown is called with the footnotes extension.

Change any one of those and the error no longer occurs. I am stumped as to the cause of the error. Below is the relevant portion of a pdb session:

> /Users/waylan/Code/md/markdown/blockprocessors.py(592)run()
-> p = etree.SubElement(parent, 'p')
(Pdb) block
'Single line footnote.'
(Pdb) parent
<Element 'div' at 0x10809b3c8>
(Pdb) parent.__class__
<class 'xml.etree.ElementTree.Element'>
(Pdb) parent.__class__.__mro__
(<class 'xml.etree.ElementTree.Element'>, <class 'object'>)
(Pdb) c
Traceback (most recent call last):
  File "mdtest.py", line 12, in <module>
    html = markdown.markdown(src, extensions=['footnotes'])
  File "/Users/waylan/Code/md/markdown/core.py", line 388, in markdown
    return md.convert(text)
  File "/Users/waylan/Code/md/markdown/core.py", line 269, in convert
    newRoot = treeprocessor.run(root)
  File "/Users/waylan/Code/md/markdown/extensions/footnotes.py", line 388, in run
    footnotesDiv = self.footnotes.makeFootnotesDiv(root)
  File "/Users/waylan/Code/md/markdown/extensions/footnotes.py", line 181, in makeFootnotesDiv
    self.parser.parseChunk(surrogate_parent, self.footnotes[id])
  File "/Users/waylan/Code/md/markdown/blockparser.py", line 105, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/Users/waylan/Code/md/markdown/blockparser.py", line 123, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/Users/waylan/Code/md/markdown/blockprocessors.py", line 592, in run
    p = etree.SubElement(parent, 'p')
TypeError: SubElement() argument 1 must be xml.etree.ElementTree.Element, not Element

I confirmed that the block is the contents of the footnote, the parent is the surrogate div and parent.__class__ is <class 'xml.etree.ElementTree.Element'>. And yet, we are getting an error saying it is not. I even checked the ancestor tree with the __mro__ attribute. I can't see any difference. That error is being generated from an internal check in xml.etree.ElementTree.

I also look through the disfusedxml code and they don't even import the Element class. They only provide alternate parsers.

By the way, with the example document, this is the second time this line of code is called. The first time was for the paragraph in the body of the document. No error was raised there. The only different was that the parent was the root div created here rather than the surrogate_parent from the footnotes extension. However, those two elements are created in the same way, so I don't understand what is going on here. Why does this only happen for footnotes?

@waylan
Copy link
Member

waylan commented May 7, 2020

@facelessuser @mitya57 could you take a look at this? I broke the issue down in my last comment and I am stumped. Thanks.

@facelessuser
Copy link
Collaborator

I may see if I can take a look sometime soon, but I just moved, so things are a little chaotic for me right now 🙂 .

@mitya57
Copy link
Collaborator

mitya57 commented May 7, 2020

Looks like it is related to this code: https://github.com/tiran/defusedxml/blob/v0.6.0/defusedxml/ElementTree.py#L44-L53.

After that code is executed, the Element class changes its ID, so Python will consider it different to the previous version of it:

>>> import sys, importlib
>>> 
>>> import xml.etree.ElementTree
>>> id(xml.etree.ElementTree.Element)
9581696
>>> 
>>> import xml.etree.ElementTree
>>> id(xml.etree.ElementTree.Element)  # stays the same
9581696
>>> 
>>> pymod = sys.modules.pop("xml.etree.ElementTree")
>>> cmod = sys.modules.pop("_elementtree")
>>> 
>>> sys.modules["_elementtree"] = None
>>> pure_pymod = importlib.import_module("xml.etree.ElementTree")
>>> sys.modules["_elementtree"] = cmod
>>> sys.modules["xml.etree.ElementTree"] = pymod
>>> 
>>> import xml.etree.ElementTree
>>> id(xml.etree.ElementTree.Element)  # now different
25336288

@waylan
Copy link
Member

waylan commented May 7, 2020

Good catch. I looked at that code, but apparently missed what it was doing. But why does the error occur only for footnotes, but not for anywhere else?

@mitya57
Copy link
Collaborator

mitya57 commented May 8, 2020

This is not specific to footnotes, I can reproduce it with abbr extension too:

import markdown
from defusedxml import ElementTree

src = '''
The HTML specification
is maintained by the W3C.

*[HTML]: Hyper Text Markup Language
*[W3C]:  World Wide Web Consortium
'''

markdown.markdown(src, extensions=['abbr'])

This happens because markdown module is loaded before the defusedxml hack, but the extension is lazily loaded after that, during the markdown.markdown call.

Moving the defusedxml import before the markdown import fixes this.

@waylan
Copy link
Member

waylan commented May 8, 2020

but the extension is lazily loaded after

Right. Of course. I had considered that at one point, but that was before I new what was causing the problem to begin with.

Moving the defusedxml import before the markdown import fixes this.

I suspected that might fix it.

@rcarmo the fix for this is for you to import nbconvert before you input markdown. As far as I am concerned, this is a bug in defusedxml as they are breaking the xml.etree.ElementTree package.

Yes, this was not an issue in previous versions of Markdown, but that was because we were importing our own copy of the etree package. We are no longer doing that (and never should have needed to). I also understand why defusedxml is doing what they are doing, but they need to not break other packages use of etree.

I am closing this as wontfix.

@waylan
Copy link
Member

waylan commented Aug 9, 2020

I haven't tested it myself, but I suspect this will be resolved when tiran/defusedxml#40 lands.

@waylan
Copy link
Member

waylan commented Mar 8, 2022

Supposedly, this was fixed this in tiran/defusedxml#60 and tiran/defusedxml#63. According to their changelog, those changes should be available in difusedxml 0.8.0, whenever that is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix The issue will not be fixed for the stated reasons.
Projects
None yet
Development

No branches or pull requests

4 participants