3.2.1 breaks when defusedxml is loaded #950

rcarmo · 2020-04-28T23:38:25Z

I was upgrading one of my projects from markdown==3.1 to markdown==3.2.1, and had a very weird behaviour that was quite hard to track down.

In my test case, I had this in my imports:

from markdown import Markdown
from nbconvert import HTMLExporter

def render_markdown(raw):
    return markdown_renderer.reset().convert(raw)

def render_ipynb(raw):
    exporter = HTMLExporter()
    exporter.template_file="basic"
    return smartypants(exporter.from_notebook_node(reads(renumber_notebook(raw), 4))[0]

When I called only render_markdown, I got this pretty obscure exception:

Traceback (most recent call last):
  File "markdown_test.py", line 81, in <module>
    print(render_markdown(test_block))
  File "markdown_test.py", line 79, in render_markdown
    return markdown_renderer.reset().convert(raw)
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/core.py", line 269, in convert
    newRoot = treeprocessor.run(root)
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/extensions/footnotes.py", line 388, in run
    footnotesDiv = self.footnotes.makeFootnotesDiv(root)
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/extensions/footnotes.py", line 181, in makeFootnotesDiv
    self.parser.parseChunk(surrogate_parent, self.footnotes[id])
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/blockparser.py", line 105, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/blockparser.py", line 123, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/Users/rcarmo/.pyenv/versions/3.6.9/lib/python3.6/site-packages/markdown/blockprocessors.py", line 591, in run
    p = etree.SubElement(parent, 'p')
TypeError: SubElement() argument 1 must be xml.etree.ElementTree.Element, not Element
make: *** [smoke-test] Error 1

Reverting back to markdown==3.1 "fixed" it, and it took me a long time to figure out what caused it in the first place.

As it turns out, it went away when I removed the nbconvert import or moved it inside render_ipynb (which is bad form, but turned out to be the quickest workaround). I assume that nbconvert does some kind of conflicting initialization to markdown that does not go away with reset()...

But since it wasn't happening in markdown==3.1 and happens in both still versions of nbconvert I've tried (nbconvert==5.5.0 and nbconvert==5.6.1), I'm assuming something changed in markdown to cause this issue.

The text was updated successfully, but these errors were encountered:

facelessuser · 2020-04-28T23:47:24Z

You can checkout the diff: 3.1...3.2.1.

Right now I'd be inclined to say it is an nbconvert issue, but I know nothing about that package and what it does. As far as Python Markdown is concerned it works and passes all tests. You have a unique environment issue. If you are able to track down the issue, depending on what it is, maybe it could be addressed here, but we'd need more info.

rcarmo · 2020-04-28T23:50:34Z

nbconvert renders Jupyter notebooks, and as such relies itself upon Markdown - hence my suspicion that loading it causes some form of initialisation that isn't "re-entrant" in a way.

waylan · 2020-04-29T14:14:08Z

A few things immediately jump out at me when looking at your test case:

It appears that markdown_renderer, smartypants, reads, and renumber_notebook are not defined or imported. I'm not familiar with nbconvert so I don't know if that library is responsible for those or if they are coming from somewhere else. In other words, I couldn't replicate your test if I wanted to.
The Traceback suggests that the error is encountered when Markdown is parsing the contents of a footnote. Specifically, when attempting to create a <p> element it encounter's a TypeError. I'm curious if the error occurs if no footnotes exist in the document or if the footnotes extension is disabled. In other words, is this specific to footnotes or not?
The TypeError seems to revolve around the xml.etree lib in the Python standard library and suggests that an Element instance is not an instance or subclass of xml.etree.ElementTree.Element. We changed the way we import xml.etree in Simplify xml.etree.ElementTree loading #902. My guess is that this is where the issue is coming from. Could it be that nbconvert is monkey-patching xml.etree or something?

rcarmo · 2020-04-29T17:59:52Z

Hi there.

Let me take these by number:

I had removed the imports for those by the time I reached this point, since I had already established that imports alone were enough to break render_markdown.

So this is still a “valid” test case, as long as you only invoke render_markdown (we’re splitting hairs here, but you get the gist of things).

the document did have footnotes, and the extension is in fact enabled. I had fiddled with the various configuration options and markup formatting to make sure of that well before I got to this point.
I do not know what nbconvert does besides also importing markdown. All I can add is that I use lxml in the same project, and that is (naturally) not affected (I might have noticed something else in my XML processing otherwise)

waylan · 2020-04-29T18:47:25Z

I had removed the imports for those by the time I reached this point, since I had already established that imports alone were enough to break render_markdown.

So this is still a “valid” test case, as long as you only invoke render_markdown (we’re splitting hairs here, but you get the gist of things).

As render_markdown only calls methods of an undefined object markdown_renderer, this is not a valid test. I have no idea if markdown_renderer is an instance of markdown.Markdown or some custom subclass (I can only assume it is one of the two based on the calls to reset().convert()). I also have no way of knowing what arguments have been used to create the Markdown instance. So, no, I am not able to replicate the test or the error.

2 the document did have footnotes, and the extension is in fact enabled. I had fiddled with the various configuration options and markup formatting to make sure of that well before I got to this point.

This doesn't answered my question: Does the error occur if no footnotes exist in the document? If so, what is the traceback (which obviously wouldn't come from within a call to the footnote parser). I'm guessing footnotes are a red herring here, but I can't be sure with the information provided so far.

I do not know what nbconvert does besides also importing markdown. All I can add is that I use lxml in the same project, and that is (naturally) not affected (I might have noticed something else in my XML processing otherwise)

Given the lack of information above, I have no way of knowing if this is relevant or not.

rcarmo · 2020-04-29T21:46:30Z

Hello again, 1) yes, it is an instance of markdown_renderer. Please give me some credit, I am indeed trying to help. Why else would I have filed a ticket here? What more info is necessary? Why the (perceived) hostility? 2) the error does _not_ occur without footnotes (in a universe of 7000-odd documents, it only occurred in the ones with footnotes-roughly 500). I tried many variations on footnote markup, contents, with or without inline links - the mere presence of any footnote broke 3.2.1 in this situation whereas 3.1 worked with all the versions of nbconvert I tried (admittedly only two). 3) It is the information I have, within the time I had to make it available after a long series of tests. If the info as a whole is not relevant or helpful, feel free to close the ticket - I am happy to wait until this is fixed through some other means, or try to build a fully self-contained test case that satisfies your criteria at a later date. (Also, kindly note that going to and fro both Markdown versions and checking all the other libraries involved in multiple combinations did take a while, to the point where I ended up setting up several virtualenvs to pin this down - the imports and the snippet of code I shared were the smallest bits I could snip off to demonstrate - as an example - that it was indeed an interaction between at least two recent versions of nbconvert and Markdown 3.2.1 that was breaking things, and opening up a private repo and pointing to a set of branches would likely not have been as easy a way to demonstrate the issue...) Thanks!

…

On 29 Apr 2020, at 19:47, Waylan Limberg ***@***.***> wrote: I had removed the imports for those by the time I reached this point, since I had already established that imports alone were enough to break render_markdown. So this is still a “valid” test case, as long as you only invoke render_markdown (we’re splitting hairs here, but you get the gist of things). As render_markdown only calls methods of an undefined object markdown_renderer, this is not a valid test. I have no idea if markdown_renderer is an instance of markdown.Markdown or some custom subclass (I can only assume it is one of the two based on the calls to reset().convert()). I also have no way of knowing what arguments have been used to create the Markdown instance. So, no, I am not able to replicate the test or the error. 2 the document did have footnotes, and the extension is in fact enabled. I had fiddled with the various configuration options and markup formatting to make sure of that well before I got to this point. This doesn't answered my question: Does the error occur if no footnotes exist in the document? If so, what is the traceback (which obviously wouldn't come from within a call to the footnote parser). I'm guessing footnotes are a red herring here, but I can't be sure with the information provided so far. I do not know what nbconvert does besides also importing markdown. All I can add is that I use lxml in the same project, and that is (naturally) not affected (I might have noticed something else in my XML processing otherwise) Given the lack of information above, I have no way of knowing if this is relevant or not. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

waylan · 2020-04-29T22:04:27Z

Sorry but I still have no idea what markdown_renderer is. It is not anything that comes from our library. My guess is that it is a wrapper around our library, but until I know what it does or where it comes from, I can't help you.

rcarmo · 2020-04-30T05:48:24Z

markdown_renderer = Markdown(extensions=[
        "markdown.extensions.extra",
        "markdown.extensions.toc",
        "markdown.extensions.smarty",
        "markdown.extensions.codehilite",
        "markdown.extensions.meta",
        "markdown.extensions.sane_lists"
    ], extension_configs = {
        "codehilite": {
            "css_class": "highlight"
        }
    },
    output_format="html5"
)

waylan · 2020-04-30T19:31:43Z

I can confirm I am seeing the same behavior.

I also searched through nbconvert's code and I discovered that they are using tiran/defusedxml in nbconvert/filters/strings.py

I was able to trim a failing example down to this most simple form:

import markdown
from defusedxml import ElementTree

src = '''
Markdown paragraph with a footnote.[^1]

[^1]: Single line footnote.
'''

markdown.markdown(src, extensions=['footnotes'])

I tried other variations and nothing else seems to make a difference one way or the other. The error always occurs so long as all three of the following are true:

defusedxml is imported (or another imported library imports it).
The document contains at least one footnote.
Markdown is called with the footnotes extension.

Change any one of those and the error no longer occurs. I am stumped as to the cause of the error. Below is the relevant portion of a pdb session:

> /Users/waylan/Code/md/markdown/blockprocessors.py(592)run()
-> p = etree.SubElement(parent, 'p')
(Pdb) block
'Single line footnote.'
(Pdb) parent
<Element 'div' at 0x10809b3c8>
(Pdb) parent.__class__
<class 'xml.etree.ElementTree.Element'>
(Pdb) parent.__class__.__mro__
(<class 'xml.etree.ElementTree.Element'>, <class 'object'>)
(Pdb) c
Traceback (most recent call last):
  File "mdtest.py", line 12, in <module>
    html = markdown.markdown(src, extensions=['footnotes'])
  File "/Users/waylan/Code/md/markdown/core.py", line 388, in markdown
    return md.convert(text)
  File "/Users/waylan/Code/md/markdown/core.py", line 269, in convert
    newRoot = treeprocessor.run(root)
  File "/Users/waylan/Code/md/markdown/extensions/footnotes.py", line 388, in run
    footnotesDiv = self.footnotes.makeFootnotesDiv(root)
  File "/Users/waylan/Code/md/markdown/extensions/footnotes.py", line 181, in makeFootnotesDiv
    self.parser.parseChunk(surrogate_parent, self.footnotes[id])
  File "/Users/waylan/Code/md/markdown/blockparser.py", line 105, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/Users/waylan/Code/md/markdown/blockparser.py", line 123, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/Users/waylan/Code/md/markdown/blockprocessors.py", line 592, in run
    p = etree.SubElement(parent, 'p')
TypeError: SubElement() argument 1 must be xml.etree.ElementTree.Element, not Element

I confirmed that the block is the contents of the footnote, the parent is the surrogate div and parent.__class__ is <class 'xml.etree.ElementTree.Element'>. And yet, we are getting an error saying it is not. I even checked the ancestor tree with the __mro__ attribute. I can't see any difference. That error is being generated from an internal check in xml.etree.ElementTree.

I also look through the disfusedxml code and they don't even import the Element class. They only provide alternate parsers.

By the way, with the example document, this is the second time this line of code is called. The first time was for the paragraph in the body of the document. No error was raised there. The only different was that the parent was the root div created here rather than the surrogate_parent from the footnotes extension. However, those two elements are created in the same way, so I don't understand what is going on here. Why does this only happen for footnotes?

waylan · 2020-05-07T19:00:29Z

@facelessuser @mitya57 could you take a look at this? I broke the issue down in my last comment and I am stumped. Thanks.

facelessuser · 2020-05-07T19:05:29Z

I may see if I can take a look sometime soon, but I just moved, so things are a little chaotic for me right now 🙂 .

mitya57 · 2020-05-07T19:45:22Z

Looks like it is related to this code: https://github.com/tiran/defusedxml/blob/v0.6.0/defusedxml/ElementTree.py#L44-L53.

After that code is executed, the Element class changes its ID, so Python will consider it different to the previous version of it:

>>> import sys, importlib
>>> 
>>> import xml.etree.ElementTree
>>> id(xml.etree.ElementTree.Element)
9581696
>>> 
>>> import xml.etree.ElementTree
>>> id(xml.etree.ElementTree.Element)  # stays the same
9581696
>>> 
>>> pymod = sys.modules.pop("xml.etree.ElementTree")
>>> cmod = sys.modules.pop("_elementtree")
>>> 
>>> sys.modules["_elementtree"] = None
>>> pure_pymod = importlib.import_module("xml.etree.ElementTree")
>>> sys.modules["_elementtree"] = cmod
>>> sys.modules["xml.etree.ElementTree"] = pymod
>>> 
>>> import xml.etree.ElementTree
>>> id(xml.etree.ElementTree.Element)  # now different
25336288

waylan · 2020-05-07T23:22:33Z

Good catch. I looked at that code, but apparently missed what it was doing. But why does the error occur only for footnotes, but not for anywhere else?

mitya57 · 2020-05-08T10:16:42Z

This is not specific to footnotes, I can reproduce it with abbr extension too:

import markdown
from defusedxml import ElementTree

src = '''
The HTML specification
is maintained by the W3C.

*[HTML]: Hyper Text Markup Language
*[W3C]:  World Wide Web Consortium
'''

markdown.markdown(src, extensions=['abbr'])

This happens because markdown module is loaded before the defusedxml hack, but the extension is lazily loaded after that, during the markdown.markdown call.

Moving the defusedxml import before the markdown import fixes this.

waylan · 2020-05-08T14:45:14Z

but the extension is lazily loaded after

Right. Of course. I had considered that at one point, but that was before I new what was causing the problem to begin with.

Moving the defusedxml import before the markdown import fixes this.

I suspected that might fix it.

@rcarmo the fix for this is for you to import nbconvert before you input markdown. As far as I am concerned, this is a bug in defusedxml as they are breaking the xml.etree.ElementTree package.

Yes, this was not an issue in previous versions of Markdown, but that was because we were importing our own copy of the etree package. We are no longer doing that (and never should have needed to). I also understand why defusedxml is doing what they are doing, but they need to not break other packages use of etree.

I am closing this as wontfix.

waylan · 2020-08-09T20:37:26Z

I haven't tested it myself, but I suspect this will be resolved when tiran/defusedxml#40 lands.

Per Python-Markdown/markdown#950

waylan · 2022-03-08T19:59:33Z

Supposedly, this was fixed this in tiran/defusedxml#60 and tiran/defusedxml#63. According to their changelog, those changes should be available in difusedxml 0.8.0, whenever that is released.

waylan closed this as completed May 8, 2020

waylan added the wontfix The issue will not be fixed for the stated reasons. label May 8, 2020

iacchus mentioned this issue Jul 20, 2020

Manual install of pelican_jupyter triggers TypeError: SubElement() argument 1 must be xml.etree.ElementTree.Element, not Element danielfrg/pelican-jupyter#124

Open

waylan mentioned this issue Aug 7, 2020

markdown footnotes plugins raise TypeError #1009

Closed

facelessuser mentioned this issue Aug 8, 2020

TOC fails #1010

Closed

waylan changed the title ~~3.2.1 breaks when nbconvert is loaded~~ 3.2.1 breaks when defusedxml is loaded Aug 9, 2020

Gittenburg mentioned this issue Aug 25, 2020

defusedxml.ElementTree breaks the xml.etree.ElementTree package tiran/defusedxml#54

Closed

waylan mentioned this issue Sep 16, 2020

Made package do specific imports from xml.etree.ElementTree to prevent collision with defusedxml #1022

Closed

greenape added a commit to greenape/mknotebooks that referenced this issue Oct 8, 2020

Switch import order of nbconvert and markdown

316447b

Per Python-Markdown/markdown#950

facelessuser mentioned this issue Nov 20, 2020

Delete and Insert Extensions throw type error facelessuser/pymdown-extensions#1147

Closed

tleonardi added a commit to tleonardi/nanocompore that referenced this issue Nov 27, 2020

Force markdown version to be <3.2 due to Python-Markdown/markdown#950

d97b431

mitya57 mentioned this issue Mar 8, 2022

Footnotes module regression? #1230

Closed

mitya57 mentioned this issue Feb 11, 2024

Rst preview does not work with pasted code retext-project/retext#589

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.2.1 breaks when defusedxml is loaded #950

3.2.1 breaks when defusedxml is loaded #950

rcarmo commented Apr 28, 2020 •

edited

Loading

facelessuser commented Apr 28, 2020

rcarmo commented Apr 28, 2020

waylan commented Apr 29, 2020

rcarmo commented Apr 29, 2020

waylan commented Apr 29, 2020

rcarmo commented Apr 29, 2020 via email

waylan commented Apr 29, 2020

rcarmo commented Apr 30, 2020

waylan commented Apr 30, 2020 •

edited

Loading

waylan commented May 7, 2020

facelessuser commented May 7, 2020

mitya57 commented May 7, 2020

waylan commented May 7, 2020

mitya57 commented May 8, 2020

waylan commented May 8, 2020

waylan commented Aug 9, 2020

waylan commented Mar 8, 2022 •

edited

Loading

3.2.1 breaks when defusedxml is loaded #950

3.2.1 breaks when defusedxml is loaded #950

Comments

rcarmo commented Apr 28, 2020 • edited Loading

facelessuser commented Apr 28, 2020

rcarmo commented Apr 28, 2020

waylan commented Apr 29, 2020

rcarmo commented Apr 29, 2020

waylan commented Apr 29, 2020

rcarmo commented Apr 29, 2020 via email

waylan commented Apr 29, 2020

rcarmo commented Apr 30, 2020

waylan commented Apr 30, 2020 • edited Loading

waylan commented May 7, 2020

facelessuser commented May 7, 2020

mitya57 commented May 7, 2020

waylan commented May 7, 2020

mitya57 commented May 8, 2020

waylan commented May 8, 2020

waylan commented Aug 9, 2020

waylan commented Mar 8, 2022 • edited Loading

rcarmo commented Apr 28, 2020 •

edited

Loading

waylan commented Apr 30, 2020 •

edited

Loading

waylan commented Mar 8, 2022 •

edited

Loading