-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZnUrl replaces percent-encoded octets for reserved characters #97
Comments
Additional example:
|
Hi Kris, Thanks a lot for your input, you are certainly on to something. But basically, you are saying that parsing/printing is not symmetrical, right ? The question remains what are we going to do, and how are we going to implement it ? I believe there might be room to improve on the current situation, but I am not yet seeing it clearly. Sven |
I’m not sure either. The easiest aspect of
The problem here is that it’s not possible to get
One possible question regarding
As |
ZnUrl seems to go against RFC 3986, in that it replaces percent-encoded octets for some reserved characters by those characters. Take the following block:
Examples of how this block transforms URLs:
https://example.com/?a=b%3Dc
⇒https://example.com/?a=b%3Dc
The two URLs are exactly the same.
https://example.com/?a~b%7Ec
⇒https://example.com/?a~b~c
The two URLs differ (
%7E
versus~
), but per section ‘2.3. Unreserved Characters’ in RFC 3986 they are equivalent: “URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent”.The problem is in the third example:
https://example.com/?a;b%3Bc
⇒https://example.com/?a;b;c
The two URLs differ (
%3B
versus;
), and per section ‘2.2 Reserved Characters’ in RFC 3986, they are not equivalent: “URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent”.Note that the equals sign, used in the first example, is also a reserved character and is used as a delimiter in the URL-encoding of forms in HTML. As far as I understand, the intent of section 2.2 in RFC 3986 is that one could define a similar encoding that uses other reserved characters as delimiters: the queries of the URLs in the third example could be encodings of arrays of strings, in which the array
#('a' 'b;c')
is encoded asa;b%3Bc
and the array#('a' 'b' 'c')
asa;b;c
.Section ‘4.2.3. http(s) Normalization and Comparison’ in RFC 9110 states the following, for which it refers back to RFC 3986: “characters other than those in the "reserved" set are equivalent to their percent-encoded octets”.
See my comment in issue #89 for how this is related to that issue.
The text was updated successfully, but these errors were encountered: