Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology issue: Use IRI as resource identifier #183

Closed
tkanai opened this issue Mar 4, 2016 · 12 comments
Closed

Terminology issue: Use IRI as resource identifier #183

tkanai opened this issue Mar 4, 2016 · 12 comments

Comments

@tkanai
Copy link
Contributor

tkanai commented Mar 4, 2016

Both JSON-LD [1] and Turtle [2] clearly say that resource IDs are expressed in IRI format. Then, I think IDs in models would be written in IRI, when they are serialized into each format.
On the other hand, the model document uses "URI" in many places, and it might make developers uncertain whether the IDs should be percentage encoded/decoded or not, in serialization/parsing processes.
To remove such ambiguities from the document, the terms URI in the document should be replaced with IRI.

[1] https://www.w3.org/TR/json-ld/#iris
[2] https://www.w3.org/TR/2014/REC-turtle-20140225/#sec-iri

@akuckartz
Copy link

👍 for accuracy and precision

@tcole3
Copy link
Contributor

tcole3 commented Mar 8, 2016

Please see Minutes from 4 March for more on state of discussion of this issue.
https://www.w3.org/2016/03/04-annotation-minutes.html#item06

After reading, please weigh in with more thoughts on whether we should use URL or URI or IRI in our Recommendations.

Does someone want to add here a proposal to use URI or URL as a counterpoint to @Takani's proposal to use IRI throughout?

@azaroth42
Copy link
Collaborator

👍 to IRI, and suggest that we put an entry in the terminology section to explain the implications

@jjett
Copy link

jjett commented Mar 8, 2016

+1 to IRI and implications explanation.

@iherman
Copy link
Member

iherman commented Mar 11, 2016

Discussed on call 2016-03-11: go in with IRI and explain what that means (eg, in the terminology section), revisit when more information

See: http://www.w3.org/2016/03/11-annotation-irc#T16-14-53

@tkanai
Copy link
Contributor Author

tkanai commented Mar 18, 2016

The examples below show how modern browsers differently keep typed URL inside (document.URL, window.location).
To turn URL into IRI, it would be necessary to apply both IDNA's toUnicode() [1] for URL host name and "decode percent encoding"[2] for URL path.
The difference between IDNA 2003 and IDNA 2008 [3] might not affect to the IRI conversion, but I'm not 100% sure.

  1. non-Ascii URL host name
  2. non-Ascii URL path

[1] https://tools.ietf.org/html/rfc3490
[2] https://tools.ietf.org/html/rfc3986
[3] http://unicode.org/reports/tr46/

@iherman
Copy link
Member

iherman commented Mar 18, 2016

On 18 Mar 2016, at 11:54, Takeshi Kanai [email protected] wrote:

The examples below show how modern browsers differently keep typed URL inside (document.URL, window.location).
To turn URL into IRI, it would be necessary to apply both IDNA's toUnicode() [1] for URL host name and "decode percent encoding"[2] for URL path.

I may misunderstand what you say, but isn't it so that if we use IRI-s, what it means that implementations may have to perform [1] and [2], to turn IRI-s into URL-s for a precise comparison? Well, actually, this is what applications on Firefox and IE11 have to do (do you have any result on Edge?), because, it seems, Chroms and Safari (and, I presume, all other WebKit derivatives) will store IRI-s in URL-s after all that encoding, ie, a javascript can make a comparison without further ado.

The difference between IDNA 2003 and IDNA 2008 [3] might not affect to the IRI conversion, but I'm not 100% sure.

non-Ascii URL host name

[original] http://新宿駅.jp
IE 11: http://新宿駅.jp
Google Chrome: http://xn--oct34u4y7b.jp http://xn--oct34u4y7b.jp/
Safari: http://xn--oct34u4y7b.jp http://xn--oct34u4y7b.jp/
FireFox: http://新宿駅.jp
non-Ascii URL path

[original] https://ja.wikipedia.org/wiki/新宿駅 https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85
IE 11: https://ja.wikipedia.org/wiki/新宿駅 https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85
Google Chrome:https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85 https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85
Safari:https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85 https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85
FireFox:https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85 https://ja.wikipedia.org/wiki/%E6%96%B0%E5%AE%BF%E9%A7%85
[original] https://fr.wikipedia.org/wiki/café https://fr.wikipedia.org/wiki/caf%C3%A9
IE 11: https://fr.wikipedia.org/wiki/Caf%C3%A9 https://fr.wikipedia.org/wiki/Caf%C3%A9
Google Chrome:https://fr.wikipedia.org/wiki/Caf%C3%A9 https://fr.wikipedia.org/wiki/Caf%C3%A9
Safari:https://fr.wikipedia.org/wiki/Caf%C3%A9 https://fr.wikipedia.org/wiki/Caf%C3%A9
FireFox:https://fr.wikipedia.org/wiki/Caf%C3%A9 https://fr.wikipedia.org/wiki/Caf%C3%A9
[1] https://tools.ietf.org/html/rfc3490 https://tools.ietf.org/html/rfc3490
[2] https://tools.ietf.org/html/rfc3986 https://tools.ietf.org/html/rfc3986
[3] http://unicode.org/reports/tr46/ http://unicode.org/reports/tr46/

@azaroth42
Copy link
Collaborator

Is there an additional editorial action needed here? I don't want to lose the valuable examples, but not sure what to do with them?

@iherman
Copy link
Member

iherman commented Mar 19, 2016

I think we agreed that there should be, somewhere, a note detailing the consequences of using IRI (essentially, care should be taken when comparing things). As this is an informal note anyway, it seems to be a good idea to simply put a link to Takeshi's comment.

WDYT?

@tkanai
Copy link
Contributor Author

tkanai commented Mar 23, 2016

+1 to adding a note, but not sure whether putting a link to a github comment is acceptable as W3C spec or not.

As requested, I have checked how Edge worked with the URLs, and confirmed that it was the same with the IE11's.
When I changed the Cafe URL path from "café" to "Café", both IE and Edge return non URL encoded strings, I mean it is the same behavior with the Japanese URL path case. It appears to me that both browsers get redirection messages, when they accessed to the "café" URL, and then reached to the URL encoded address. So, the "café" URL results are not appropriate examples for this issue.

To convert from URL encoded host name to IRI friendly host name, I tested punycode.js and found no errors, so far.
Here are the basic steps of the conversion.

  1. url.hostname = punycode.toUnicode(url.hostname);
  2. url.pathname = decodeURI(url.pathname);

It is a workaround, and I am wondering if IRI would be available from browsers, via document.IRI or window.id APIs for example, in the near future.

@iherman
Copy link
Member

iherman commented Mar 23, 2016

On 22 Mar 2016, at 22:42, Takeshi Kanai [email protected] wrote:

+1 to adding a note, but not sure whether putting a link to a github comment is acceptable as W3C spec or not.

I think if the link is part of an informative note (and it is), then it should be fine.

As requested, I have checked how Edge worked with the URLs, and confirmed that it was the same with the IE11's.
When I changed the Cafe URL path from "café" to "Café", both IE and Edge return non URL encoded strings, I mean it is the same behavior with the Japanese URL path case. It appears to me that both browsers get redirection messages, when they accessed to the "café" URL, and then reached to the URL encoded address. So, the "café" URL results are not appropriate examples for this issue.

To convert from URL encoded host name to IRI friendly host name, I tested punycode.js https://github.com/bestiejs/punycode.js and found no errors, so far.
Here are the basic steps of the conversion.

  1. url.hostname = punycode.toUnicode(url.hostname);
  2. url.pathname = decodeURI(url.pathname);

It is a workaround, and I am wondering if IRI would be available from browsers, via document.IRI or window.id APIs for example, in the near future.

I do not know…

I think having the note in the document may/will trigger comments when we get the I18N horizontal review. Maybe we will get wiser then…:-)

@azaroth42
Copy link
Collaborator

Fixed in 3/31 draft. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants