Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feedparser should not attempt to parse HTTP error pages #460

Open
dechamps opened this issue Jul 23, 2024 · 0 comments
Open

feedparser should not attempt to parse HTTP error pages #460

dechamps opened this issue Jul 23, 2024 · 0 comments

Comments

@dechamps
Copy link

$ python3 -c 'import feedparser; import pprint; pprint.pp(feedparser.parse("http://httpstat.us/500"));'
{'bozo': 1,
 'entries': [],
 'feed': {},
 'headers': {'content-length': '25',
             'connection': 'close',
             'content-type': 'text/plain',
             'date': 'Tue, 23 Jul 2024 09:26:01 GMT',
             'server': 'Kestrel',
             'set-cookie': 'ARRAffinity=0b6744c5c65f60053b4261472f06470832ebaff4bed4a8258e6eb824fe0a51e1;Path=/;HttpOnly;Domain=httpstat.us',
             'request-context': 'appId=cid-v1:3548b0f5-7f75-492f-82bb-b6eb0e864e53'},
 'href': 'http://httpstat.us/500',
 'status': 500,
 'encoding': 'us-ascii',
 'bozo_exception': SAXParseException('syntax error'),
 'version': '',
 'namespaces': {}}

feedparser reports SAXParseException('syntax error') on a 500 HTTP status code, suggesting that it attempted to parse the 500 error body. This is a confusing error - ideally, feedparser should not even attempt to parse HTTP error pages, and should clearly report the HTTP error instead.

(Also, this should really raise an exception, but that's a separate issue - see #329)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant