ZnUrl replaces percent-encoded octets for reserved characters #97

Rinzwind · 2022-06-27T12:25:28Z

ZnUrl seems to go against RFC 3986, in that it replaces percent-encoded octets for some reserved characters by those characters. Take the following block:

[ :url | (ZnUrl fromString: url) asString ]

Examples of how this block transforms URLs:

https://example.com/?a=b%3Dc ⇒ https://example.com/?a=b%3Dc
The two URLs are exactly the same.
https://example.com/?a~b%7Ec ⇒ https://example.com/?a~b~c
The two URLs differ (%7E versus ~), but per section ‘2.3. Unreserved Characters’ in RFC 3986 they are equivalent: “URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent”.

The problem is in the third example:

https://example.com/?a;b%3Bc ⇒ https://example.com/?a;b;c
The two URLs differ (%3B versus ;), and per section ‘2.2 Reserved Characters’ in RFC 3986, they are not equivalent: “URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent”.

Note that the equals sign, used in the first example, is also a reserved character and is used as a delimiter in the URL-encoding of forms in HTML. As far as I understand, the intent of section 2.2 in RFC 3986 is that one could define a similar encoding that uses other reserved characters as delimiters: the queries of the URLs in the third example could be encodings of arrays of strings, in which the array #('a' 'b;c') is encoded as a;b%3Bc and the array#('a' 'b' 'c') as a;b;c.

Section ‘4.2.3. http(s) Normalization and Comparison’ in RFC 9110 states the following, for which it refers back to RFC 3986: “characters other than those in the "reserved" set are equivalent to their percent-encoded octets”.

See my comment in issue #89 for how this is related to that issue.

The text was updated successfully, but these errors were encountered:

Rinzwind · 2022-06-27T13:54:01Z

Additional example:

https://example.com/?a+b%2Bc ⇒ https://example.com/?a%20b%2Bc
The two URLs differ (+ versus %20) and are not equivalent as the plus sign is a reserved character. Note that the plus sign is replaced by %20 (space in ASCII) rather than %2B (plus sign in ASCII). Plus signs are used to encode spaces in the URL-encoding of HTML forms, but the query of a URL is not necessarily an encoded HTML-form.

svenvc · 2022-06-27T16:09:47Z

Hi Kris,

Thanks a lot for your input, you are certainly on to something.

But basically, you are saying that parsing/printing is not symmetrical, right ?

The question remains what are we going to do, and how are we going to implement it ?

I believe there might be room to improve on the current situation, but I am not yet seeing it clearly.

Sven

Rinzwind · 2022-06-27T21:51:03Z

I’m not sure either. The easiest aspect of ZnUrl to look at first w.r.t. this issue is likely the #fragment: method though. Examples using x := ZnUrl fromString: 'https://example.com/' as a starting point:

x copy fragment: 'a;b'; asString ⇒ 'https://example.com/#a;b'
x copy fragment: 'a%3Bb'; asString ⇒ 'https://example.com/#a%253Bb'
x copy fragment: 'a^b'; asString ⇒ 'https://example.com/#a%5Eb'

The problem here is that it’s not possible to get 'https://example.com/#a%3Bb' (which, due to the semicolon being a reserved character, is not equivalent to 'https://example.com/#a;b'). A method #basicFragment: could allow that:

x copy basicFragment: 'a;b'; asString ⇒ 'https://example.com/#a;b'
x copy basicFragment: 'a%3Bb'; asString ⇒ 'https://example.com/#a%3Bb'
x copy basicFragment: 'a^b' ⇒ an error is signaled (as a caret cannot occur in the fragment per the ABNF in RFC 3986)

One possible question regarding #basicFragment: is whether it should make a distinction between these two examples or not:

x copy basicFragment: 'a~b'; asString
x copy basicFragment: 'a%7Eb'; asString

As ~ is unreserved, 'https://example.com/#a~b' and 'https://example.com/#a%7Eb' are equivalent. But there might be cases in which one wishes to distinguish between URLs that are equivalent but not equal (for example, to deal with a server which incorrectly does not treat them as equivalent).

Rinzwind mentioned this issue Sep 12, 2023

WAUrl>>#initializeFromString: error when query parameters include scheme SeasideSt/Seaside#1216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZnUrl replaces percent-encoded octets for reserved characters #97

ZnUrl replaces percent-encoded octets for reserved characters #97

Rinzwind commented Jun 27, 2022 •

edited

Loading

Rinzwind commented Jun 27, 2022

svenvc commented Jun 27, 2022

Rinzwind commented Jun 27, 2022

ZnUrl replaces percent-encoded octets for reserved characters #97

ZnUrl replaces percent-encoded octets for reserved characters #97

Comments

Rinzwind commented Jun 27, 2022 • edited Loading

Rinzwind commented Jun 27, 2022

svenvc commented Jun 27, 2022

Rinzwind commented Jun 27, 2022

Rinzwind commented Jun 27, 2022 •

edited

Loading