-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSS XML can contain invalid characters #3268
Comments
This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. |
The problem still exists. I just tested with the latest master branch:
I had to change the sample program I provided that creates the problem post slightly to account for front matter changes:
|
It is also generating |
This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. |
The problem still exists with master as of this moment:
|
This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. |
This is still a problem with current master:
|
Notes from a short investigation: I attempted to use I then added a So, it looks like we'd need to add a |
Thanks for looking at this! I was actually going to take a stab at fixing it too. Your XMLEscape idea seems great! Regarding the error from xml.EscapeText: I tested with U+000B and it output the Unicode replacement character for it rather than erroring. What character did you test with that caused an error? A sanitizeXML function seems okay too if there are indeed errors. |
I looked at the xml.EscapeText result rather quickly at the end, so you may be right. I'll take another look. |
This is to avoid including characters invalid for XML. Fixes gohugoio#3268
I have a branch that uses the EscapeText template method: https://github.com/gohugoio/hugo/compare/master...horgh:horgh/rss-invalid-chars?expand=1 I had some trouble with the tests. For some reason in the tests the vertical tab disappears all together. If I build and run a test against a hugo directory it works fine though. Any ideas? Or maybe you're making a branch anyway! Edit: And I don't understand that Travis failure! |
This is to avoid including characters invalid for XML. Fixes gohugoio#3268
This is to avoid including characters invalid for XML. Fixes gohugoio#3268
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I found that RSS feeds that Hugo generates can contain characters that are invalid for XML.
The XML 1.0 spec defines valid characters: https://www.w3.org/TR/2006/REC-xml-20060816/#charsets
One I encountered in the wild in a blog using Hugo is U+000b (\v, vertical tab). (It was this blog, if you're interested: https://blog.hypriot.com/)
Trying to parse such XML raises an error with Go's decoder (which is how I noticed this in the first place):
XML syntax error on line 10: illegal character code U+000B
My environment:
Here are two small sample Go programs to help demonstrate the problem:
Create a post with an invalid character:
Use like this:
$ ./create-problem-post > ~/t/bookshelf/content/post/newpost.md
Then re-generate the site:
$ hugo
Then try to decode the RSS feed with this program:
Like so:
Thank you!
The text was updated successfully, but these errors were encountered: