You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! First off, huge thanks for maintaining feedparser. It's legendary! We're all lucky to have it.
I hit a new (to me) AssertionError today when parsing the RSS at https://snrk.de/feed/ . Here's the relevant RSS snippet:
<content:encoded><![CDATA[ ... <p><strong>If you don’t like that, don’t use snrk.de!</strong><![dsgvo_service_control]></p> ...]]></content:encoded>
...and here's the assert:
>>> feedparser.parse(rss)
Traceback (most recent call last):
File ".../site-packages/feedparser/api.py", line 263, in parse
saxparser.parse(source)
File ".../python3.11/xml/sax/expatreader.py", line 111, in parse
xmlreader.IncrementalParser.parse(self, source)
File ".../python3.11/xml/sax/xmlreader.py", line 125, in parse
self.feed(buffer)
File ".../python3.11/xml/sax/expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
File "/private/tmp/pythonA3.11-20240402-4978-3ygh5v/Python-3.11.9/Modules/pyexpat.c", line 477, in EndElement
File ".../python3.11/xml/sax/expatreader.py", line 395, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File ".../site-packages/feedparser/parsers/strict.py", line 124, in endElementNS
self.unknown_endtag(localname)
File ".../site-packages/feedparser/mixin.py", line 321, in unknown_endtag
method()
File ".../site-packages/feedparser/namespaces/_base.py", line 488, in _end_content
value = self.pop_content('content')
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/feedparser/mixin.py", line 629, in pop_content
value = self.pop(tag)
^^^^^^^^^^^^^
File ".../site-packages/feedparser/mixin.py", line 548, in pop
output = _sanitize_html(output, self.encoding, self.contentparams.get('type', 'text/html'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/feedparser/sanitizer.py", line 883, in _sanitize_html
p.feed(html_source)
File ".../site-packages/feedparser/html.py", line 156, in feed
super(_BaseHTMLProcessor, self).feed(data)
File ".../site-packages/sgmllib.py", line 98, in feed
self.goahead(0)
File ".../site-packages/sgmllib.py", line 168, in goahead
k = self.parse_declaration(i)
^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/feedparser/html.py", line 351, in parse_declaration
return sgmllib.SGMLParser.parse_declaration(self, i)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../python3.11/_markupbase.py", line 91, in parse_declaration
return self.parse_marked_section(i)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../python3.11/_markupbase.py", line 154, in parse_marked_section
raise AssertionError(
AssertionError: unknown status keyword 'dsgvo_service_control' in marked section
Is this expected? Should I catch AssertionError everywhere I use feedparser? Any other thoughts?
feedparser 6.0.11, Python 3.11.9. Maybe related to #378...but not exactly the same. Thanks in advance!
The text was updated successfully, but these errors were encountered:
We were able to narrow down the cause of the problem to the following segment in our input.
<description >XC#<![n%</description>
We think it is the character combination <![ or as well as <![ or **<![**, which effectively renders to <![.
The problem seems to be the parsing of marked sections, from the error trace we could see that 'parse_marked_section' is mistakenly called, although it is not a marked section.
Hi! First off, huge thanks for maintaining feedparser. It's legendary! We're all lucky to have it.
I hit a new (to me)
AssertionError
today when parsing the RSS at https://snrk.de/feed/ . Here's the relevant RSS snippet:...and here's the assert:
Is this expected? Should I catch
AssertionError
everywhere I use feedparser? Any other thoughts?feedparser 6.0.11, Python 3.11.9. Maybe related to #378...but not exactly the same. Thanks in advance!
The text was updated successfully, but these errors were encountered: