-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop XSS escaping because it should be aside HTML escaping. #4555
Conversation
src/html.cr
Outdated
string.gsub(SUBSTITUTIONS) | ||
def self.escape(string : String, mode : EscapeMode = EscapeMode::Default) : String | ||
subst = case mode | ||
when EscapeMode::Default then DEFAULT_ESCAPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default extracted as first one to optimize performance for most use case.
dd31f0d
to
aad2f4f
Compare
…Escapes '&', '"', '\'', '/', '<' and '>' chars only. XSS escaping still available as a XSS option. Short escaping like Ruby one now available as a Short option. Fixes crystal-lang#3233, refs crystal-lang#2175.
aad2f4f
to
8148350
Compare
Pull request updated (HTML::EscapeMode::Extended renamed to HTML::EscapeMode::XSS). CI failed due to unrelated reason. |
|
@RX14 HTML.escape is not security-related function. https://www.w3.org/International/questions/qa-escapes#use It is meaningful. All HTML-related software depends on it (looks like every Web framework). Rack-style escaping is most used case. XSS escaping is another issue. |
Except that the reality is that people will use |
Firstly I wants to implement separate |
Anyway previous implementation was incorrect, because unusable in practice, see #3233, for example. |
What's the significance of escaping a slash? By itself it doesn't have any "dangerous" meaning in (X)HTML. OWASP recommends to escape it but they don't give a good argument and I can't think of any. Single and double quotes are not necessarily harmful, either. Only if an escaped string is used inside an attribute value. But we can assume that it's better to be safe and escape them by default. I would add an option to disable escaping of quotes, though. But: When double quotes are escaped, single quotes ought to be escaped as well. In HTML both can be used as delimiters for attributes values. I don't see any reason to escape double quotes, but not single quotes. There are different interpretations of what is considered to be dangerous in HTML, therefore there are different sets of escape characters in common use:
The default set should be as minimal as possible and as strong as necessary. XSS is an entirely different thing and should require a detailed discussion. |
Thanks, @straight-shoota. Looks like we need four flavors:
|
Hm, I'll remove XSS escaping at all. It should be aside HTML module. |
Exactly. The current implementation is not escaping HTML, but rather Javascript (inside HTML). Therefore it should be named differently (like I'm also not sure if there should be a dedicated flavour to escape slashes. Besides Ruby/Rack there don't seem to be any big ones following the OWASP recommendation. If there are just two sets, the API could be much simplified: def escape(string : String, escape_quotes = true)
# ...
end |
I prefer enums because they extendable.
|
Introduce CGI, Default and OWASP escape modes.
Another option is to use Symbols instead of enum. Symbols preferable to extend stdlib functionality, because enum cannot be reopened AFAIR. |
I find the usage like |
@straight-shoota Ok, you convinced me, Current Default will be true. |
Issues talk about problems in Please detail actual cases and issues where this may problematic. Otherwise this pull request is dangerous, because it makes something that used to be secure, insecure unless manually configured. |
@ysbaddaden there is more disambiguation introduced by #2175. @Ryuuzakis made escaping of JavaScript etc. inside HTML. HTML requires escaping for minor subset of chars. And yes, every known implementation follows one of proposed cases. No XSS at all because it is not related to HTML itself. |
So this is just a clear extraction of HTML responsibility. |
Just my point: we're doing better! Are there actual problems to escaping XSS along with HTML? Not merely intellectual (it's not required per se) but practical: this is causing me issues in specific cases. I suppose we could add an |
I was pointed to this issue just with It was very frustrating for me. But SO has a kind of answer - https://stackoverflow.com/a/13059657/1336858 |
And anyway - escape is not security related thing. |
This PR does not make the application of Of course, with this reduced set, it can not be entirely prevented that some unwanted alternations might be introduced into a string of HTML. But neither does the existing escape set. The weak point are unquoted attributes, as explained by OWASP:
To rule out every possible hijacking of unquoted attributes,, "all characters with ASCII values less than 256" except alphanumerical characters need to be escaped. Not just the few that are in the current implementation. They make it seem to be better then just those proposed in this PR, but don't add real security, because every other special character in the above code range is still available. Implementing this strong escape strategy is highly impractical because it completely prohibits usage of those special characters inside HTML-escaped strings - think of a markup processor that prevents direct HTML input, but uses special characters in the escaped text string for HTML-based enhancements. That's why every language library and web framework I've come across uses only the few escape characters mentioned in this PR. Unquoted attributes with user-input are highly error-prone and cannot generally be made safe without heavy impacts on functionality. |
There is also |
What's holding this up? I am confident the reasons for this PR have been properly explained. |
Let's sum up:
Let's close. |
Let's be constructive:
|
ok, somebody should add |
|
Endless arguing never leads anywhere. Please implement at least the following and I'll happily merge the pull request:
|
For
I honestly don't know which one should be implemented. Though I'd tend to the direction of Rails and Phoenix. They prohibit to break out of HTML tag content or attribute contexts as well as Javascript strings. That should be all that is important for this. |
Highly inspired by Phoenix.HTML.escape_javascript.
@ysbaddaden I have added HTML escaping kept as proposed, because it's ok and shared between a lot of languages and frameworks same way (see #4555 (comment) for example). |
src/html.cr
Outdated
# ``` | ||
# require "html" | ||
# | ||
# HTML.escape_javascript("</crystal> \u2028") # => ""<\\/crystal> 
"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double quotes should be reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks
# Encodes a string with JavaScript escaping, but writes to the `IO` instance provided. | ||
# | ||
# ``` | ||
# io = IO::Memory.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to use a string builder as example.
string.each_char do |char| | ||
io << SUBSTITUTIONS.fetch(char, char) | ||
if previous_char == '\r' && char == '\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be better as a case
statement to express flow more clearly and be shorter overall.
case
when previous_char == '\r' && char == '\n'
when previous_char == '<' && char == '/'
io << '\\' << '/'
else
io << ESCAPE_JAVASCRIPT_SUBST.fetch(char, char)
end
previous_char = char
spec/std/html_spec.cr
Outdated
@@ -10,21 +10,41 @@ describe "HTML" do | |||
end | |||
|
|||
it "escapes dangerous characters from a string" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reword to escapes special characters from HTML string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
spec/std/html_spec.cr
Outdated
str.should eq("safe_string") | ||
end | ||
|
||
it "escapes dangerous characters from a string" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reword to escape special characters from a JavaScript string
? The characters are not "dangerous" per se, they just might convey special meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
@@ -29,8 +38,8 @@ module HTML | |||
# | |||
# HTML.escape("Crystal & You") # => "Crystal & You" | |||
# ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to list all character substitutions explicitly, so you don't need to look at the source code to figure out what exactly gets escaped and why.
Suggestion:
# Escapes `&`, `<`, `>`, `"` and `'` chars as `&`, `<`, `>`, `"` and `` according to
# [OWASP Rule #1](https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content).
# If *escape_quotes* is `false`, `"` and `'` will not be escaped ([W3C recommendation](https://www.w3.org/International/questions/qa-escapes#use)).
The other methods should have similar explanations.
@akzhan What do you think about Since XML is more general, should we put the method only there? But most people are actually using HTML, so maybe it would be better to have it (only) there. Or make an exception and allow aliases in this case. It wouldn't be the same if HTML version would also escape |
And perhaps we should think about making escaping I was looking at markd's source code where there is a custom |
I'm tired with this PR. Feel free to propose your own. |
@akzhan Sorry you got tired. But remember: you don't have to do what everyone says here. You can accept suggestions from anyone, but you only have to make changes that are requested by core team members (and even then you can argue about your decision, in many cases core team members changed their mind). I sent #5012 to fix this situation. |
src/html.cr
Outdated
'\u2029' => "
", | ||
'\n' => "\\n", | ||
'\r' => "\\n", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
django appears to escape slightly more? https://docs.djangoproject.com/en/1.10/_modules/django/utils/html/#escape Just asking... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been closed and #5012 was merged instead. Currently there is no escape method for javascript.
Introduce escape_quotes mode.
Fixes #3233, refs #2175.
Thanks to @straight-shoota for clarification. Cite: