-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URI.encode gives a false sense of security; proper escaping is missing #10603
Comments
The use case for Agreed, this might not be the most common use case, even in web typical applications. And users seem to assume the method encodes restricted characters as well, which is obviously bad. So yes, claim 2) unfortunately looks very reasonable. I don't understand the example in
I suppose there's no other way to help users avoid shooting themselves in the foot, then. |
I guess it wanted to escape everything except And the purpose of not escaping Side note: Python escapes everything except |
So, to move this forward, I think we need to do the following:
The implementation is trivial, we just need names. I think As an obstructive name, maybe |
In Go, there are two methods called
The internal implementation also provides escaping modes for hosts/zones, paths, user/password components, and fragments (https://cs.opensource.google/go/go/+/refs/tags/go1.17:src/net/url/url.go;l=100). It seems these modes are only used by I could see our API going in a similar direction, adding A difficulty with |
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus
-- even sounds similar to the name. For escaping path [components], Python happens to hit good usability. https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote The calls are If the method were to be named Whether the current The name Yielding |
That seems fine by me.
The main problem is to design the API for encoding URI paths and path components. So you're suggesting I like about the Another issue: The method name So I believe
|
I agree with @straight-shoota that it doesn't make sense to have a parameter unless we're up for a generic method. Then, whether we ought to have a flag or two different methods, I don't have strong feelings, but for consistency, I'm slightly more in favor of a flag, since we have the BTW, your report on this matter is gold, thanks a lot @oprypin 🙇 ! |
@straight-shoota Because if it's one method, let me explain why my suggestion looked like that and the issue I have with your suggestion: encode_path() # component
encode_path(actually_a_path: true) I'd prefer encode_path_component()
encode_path() You could indeed still suggest something with escaping a path by default, like encode_path(component: true)
encode_path() but I don't like these as much, and naming the parameter could be difficult. And, in the absence of opinions, I'd say let's keep the current |
Yes, I'm definitely favouring two separate methods now. @beta-ziliani |
@straight-shoota I see you self-assigned the task :) I just had a particular idea about the documentation: each of these methods should have an example "what if I want to do what this method does but with one particular character (say, |
Yeah, I've got a fix. It's just waiting on #11124 for facilitating proper specs. I'm not sure I entirely understand what you want to put in the doc comment. It's probably best if you show an example (can do that in the PR). |
I've looked a more closely at the implementation of Go's
(I have also added these two methods to the table in the OP). Should we consider the escape rules from Go's In this context, there's also the question of non-escaping |
I honestly have no interest in trying to follow what Go does if it's different from all the other examples :/ |
Or let's put it a different way: do you foresee more user requests saying "why did it escape this character?" or "why didn't it escape this character?" Particularly consider the latter if it escapes less than Ruby. I think it's much better to just go with the majority from these examples. |
Expanding from #7997 (comment)
Consider this usecase:
Someone is making a variable part in their URL/path,
such as
/items/#{name}
,and, naturally, they form the URL by writing
"/items/#{URI.encode(name)}"
Then there appears an item named
"item #9"
.What they expect:
"http://example.org/items/item%20%239"
What they get:
"http://example.org/items/item%20#9"
So basically the escaping doesn't save you from anything, you still get a path with arbitrary "control" characters such as
#
and?
. It's kinda worse than writing nothing, because writing nothing would trigger a comment in code review "why didn't you escape it?".In the big table below you can see the data behind my claim -- just how few kinds of characters actually get escaped by
URI.encode
.Crystal used to have
URI.escape
which did exactly the right thing, and that behavior is also what most programming languages offer. E.g. Python:But #7997 removed it.
The straightforward options available now both produce the wrong result for this context:
And these two lengthy expressions are the options that you have now to do the correct thing in the typical usecase that I've shown:
And honestly, I prefer the latter, because I'm not encoding a WWW form.
Speaking of which,
URI.encode_www_form
is very good for WWW forms (and?
query parameters).But as for
URI.encode
, I claim that:#3515 (comment) shows "passing the whole URL to it" as a usecase, and lists some programming languages that have this function. But I think it misses the fact that many programming languages actually don't have this function in any shape. And that Ruby actually obsoleted it as dangerous
(the comment pointing that out somehow wasn't fully considered)
To expand on (1):
I think passing the whole URL to any escaping function is usually wrong, as different parts of it are subject to different escaping requirements. E.g. the separators / control chars mustn't be escaped, the path needs to be fully escaped, the query params need to be form-escaped. Avoiding to escape some classes of characters is only a workaround to make this use case kinda work.
Not that it's an invalid use case. Of course, for Web browsers escaping the whole URL is an everyday task. But this probably needs to be done by parsing it and escaping different parts differently, as the same comment suggests. And that's not what happens in
URI.encode
.I also think that maybe Web browsers already deal with kinda pre-escaped URLs, so for them maybe there isn't a dangerous operation such as (what people may assume to be safe but will definitely never work)
URI.escape("http://example.org/items/#{name}")
.Regardless whether this usecase is satisfied (maybe it still is, I'm not sure), the main issue (2) still stands: this function is not appropriate for almost all other use cases, but people assume that it is.
Finally, we get to the table, comparing these functions in 4 programming languages:
URI.encode
URI.escape
(gone)URI.encode_www_form
urllib.parse.quote
urllib.parse.quote('')
urllib.parse.quote_plus
URI.encode
(gone)URI.encode_www_form
CGI.escape
encodeURI
encodeURIComponent
url.PathEscape
url.QueryExcape
The program that made this
Elixir (not in this table) is in the same bad situation as Crystal (actually even seems less customizable), and people are arriving to the same workaround. Unfortunately, Elixir's behavior was used as an argument to apply this behavior.
Anyway, as you can see from the table, Python and Ruby don't have a function that doesn't escape
#
?
&
(well, in Ruby it's only obsoleted w/ a warning).Now let's expand on my claim (2) - with particular examples that it is causing danger in the wild:
For all of these cases, the payload/value can contain
&foo=bar&
at any point to specify any other arbitrary parameter.But these simply should've been
URI.encode_www_form
.This one, however, is the usecase that I'm talking about:
TablesQueriesHere's a full list (GitHub search, annotated by me) of what I looked through. I wasn't very attentive for the whole list though, maybe some comments are wrong.
More interesting fallout that should've been predicted in #7997: many of these vulnerabilities (this one and this one at least) were introduced by simply replacing the correct
URI.escape
with the wrongURI.encode
, as the deprecation message told.I think the only path forward is to deprecate
URI.encode
, so people can default to safe alternatives. Even if its behavior is wanted for the rare occasion, it should be reintroduced under some long name that doesn't let you run into it accidentally.The text was updated successfully, but these errors were encountered: