-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Custom]: "are" custom element names ASCII characters, or MUST they be ASCII characters? #239
Comments
The current specification doesn't even mention ASCII characters anywhere: https://w3c.github.io/webcomponents/spec/custom/#dfn-custom-element-type
I think it makes sense to restrict the tag name to ascii letters at least in v1. |
Any opinions? @annevk @travisleithead @hober |
I'd like to cast my vote for non-ascii in custom elements names! Emoji aside (https://jsbin.com/buzegi/edit?html,output) which is kind of cool, I think Kanji characters in tag names is a real use case 😊 Does the parser care if the tag name is non-ascii? Like, what makes it a hard problem (out of curiosity)? |
See the issue #177. There are disagreements on the exact set of characters allowed in tag names. Someone needs to investigate the issue and come up with a safe/correct subset of characters that can be used in custom elements. I'm suggesting to restrict it to only ASCII characters in v1 since it's always safe to expand the set of characters being allowed latter once someone has done that work but not vice versa. |
By the way, if we're allowing exotic non-ASCII characters like emoji, we probably don't need the hyphen requirement in those tag names since the requirement exists for the forward compatibility with future HTML documents, and I don't think we'd ever add an HTML element with an emoji in its tag name. For example, '-' almost never appears in Chinese/Japanese, and it would look absolutely awful between Hanzi/Kanji/Katakana/Hiragana/etc...: Bad: |
Also, if we do allow accented characters, would we allow capital accented letters? e.g. È is allowed but È (È) is disallowed in tag names? That would be rather confusing. |
Per the HTML parser a tag name has to start with I would be okay with requiring ASCII lowercase (with at least one hyphen) as a start and then go from there. I would also be fine with allowing more, but I don't think we should do anything that requires changing the rule that it starts with an ASCII alpha. |
(See also whatwg/html#721 about making custom elements support self-closing syntax, just like SVG and MathML.) |
I'm with Anne. Starting with something like x-джэц or my-日本酒 as legal seems reasonable enough, and leaving the HTML parser alone seems to be a Good Idea™ worth trying out in reality before we start messing with it. |
That requirement doesn't exist in the XML parser so I'm inclined to say we should get rid of that requirement in the XML documents because it really doesn't meet the author expectation in non-European languages. This should be an important consideration in the parser extensibility issue #113. Now, irrespective of HTML or XML documents, it doesn't make any sense to require Again, my preference would be to require ASCII lowercase letters for the entire tag name in v1, and extend it carefully in the future. Since, in practice, even authors in Japan, China, etc... are going to use alphanumerical tag names in HTML documents to be consistent with other builtin elements. Having said all those things, I have see two sensible options:
|
First you say you don't want to constrain XML by the rules of HTML but then you say you want to use a subset of both. 1 coupled with hyphens is definitely the easiest option here. (XML is constrained by https://www.w3.org/TR/xml-names/#NT-QName whereas createElement() is constrained by https://www.w3.org/TR/xml/#NT-Name. I think the former is a subset of the latter. But XML is also not consistently implemented across engines due to the fifth edition debacle and everyone mostly stopped caring for it.) @domenic thoughts? |
Well, that's because you said you don't want to remove the leading ASCII letter requirement. I would want to remove that requirement in XML documents if we're allowing non-ASCII letters but I'd much rather come up with something everyone agree on than keep debating this. On that ground, lowercase ASCII letters with hyphens is the easiest one to spec. IMO, we should just go with that and move on. There are too many other important issues to tackle for v1. |
Oh I think you misunderstood. I simply explained how the HTML parser is constrained and that I don't think we should change the HTML parser. I did not mean to imply that should similarly constrain the local name of custom elements. But I'm happy with the simplest thing that could possibly work. |
Oh I see. Thanks for the clarification. We should just settle on whatever safest subset we can all agree on for v1. |
I tend to agree with @rniwa that a restriction to ASCII letters in v1 makes sense. On the other hand, I was about to say "we could wait until developers ask for an expanded set and add them in the future", but then I realized @notwaldorf in this thread is a developer doing exactly that. So maybe we should be more permissive. GIven how XML is a mess and I'd probably make document.defineElement just always fail in XML documents if I could, how about the following?
|
We should disallow uppercase in XML. If we want to allow more in HTML, we should use QName from xml-names per |
I don't quite follow this reasoning. Why does stuff about browser-implemented HTML impact what we do in HTML documents?
So In any case, you seem to have the best grasp on the restrictions here. With the guiding principles of:
would you mind taking over the writing of the exact algorithm? Maybe even do it as a PR after #405 lands. |
No, if we're allowing non-ASCII characters, I want to remove the restriction that the leading letter must be a ASCII lowercase in XML documents because it just doesn't work well in languages that don't use latin alphabet. |
@rniwa why do you care about XML documents? |
@domenic : I don't care whether I write HTML documents or XML documents. But, as an author, I would rather use XML documents to get around the annoyance that the leading letter must be a ASCII lowercase in Japanese for example. It just doesn't meet author expectation. |
I'm confused. Why don't you just use HTML documents? That restriction doesn't exist there. |
@domenic : It totally does. The HTML5 parser requires that the leading letter of every tag name to be ASCII, and such is not the case in the XML parser. |
Ah I see, sorry, I was looking at DOM instead of HTML. That position makes sense... but I assume that restriction is in the parser for a good reason. Probably to deal with things like |
sigh I'm not sure I want to work on this, there's five sets of names, as far as I can tell, of which three are used (with two of them arguably wrong):
I see two sane approaches here:
|
There was talk at some point for trying to see if we could lift constraints on names altogether, but I don't think that ever happened. @foolip was the last to touch that potato. |
Option 2 seems rather risky because we could end up allowing names that can't be processed by HTML/XML parser and we may not even know about it. So I think we should go with option 1 for now. It's easy to expand the set of letters we can use later. |
The previous discussion was in a Mismatch between HTML parser and createElement() et al thread on blink-dev, spawned from a Inconsistency in characters allowed in attribute names between setAttribute and HTML syntax specs spec bug. ASCII alpha + ASCII hyphen seems like the safer option, really. |
We should probably file an issue in HTML and figure out "the one definition". I'm more than happy to use this definition once it's ready (even in v1) but I don't want to hold up the custom elements API on that. |
"One or more a-z (lowercase), followed by a hyphen, followed by zero or more a-z (lowercase) or hyphen." |
Oh, I meant that the one definition that includes non-ASCII letters if that is even possible. |
I see, it really depends on what the requirements are. Does it need to be supported by the HTML parser? Does it need to be supported by |
Well, I'm saying that we should probably figure this out for |
@rniwa oh, I don't think we can change HTML to allow non-a-z at the start. That would change the parsing of |
I think you're still misunderstanding me. What I'm saying is that there should be one definition in one spec which defines what valid name means for HTML documents, which may refer to XML spec, and defines a set of valid names for HTML parser, XML parser, createDocument in HTML documents, and createDocument in XML documents. Hopefully there aren't many discrepancies between them but as you noted, they can't all be the same. Now, if there is a known definitely safe subset of all those four potentially distinct sets that we can use for custom elements, then I'm all for it. But it sounded like there isn't, or they aren't even well defined yet. So it seems that we need to do the exercise of determining those four sets first before expanding the set of valid names allowed in custom elements |
I went with a liberal-as-possible intersection set in 35086b3. See https://w3c.github.io/webcomponents/spec/custom/#valid-custom-element-name for the rendered output. We can work toward centralizing all definitions into one place (presumably DOM) later, and I think they will indeed all be distinct, but the definitions are already out there. I guess either DOM or browsers have a bug since DOM specifies XML 5th edition and browsers use XML 4th edition for createElement(NS). But for now custom elements will just use XML 5th edition like DOM does, and if we want to change both at once to align with browser reality (instead of making browsers more liberal) we can definitely do so. |
Would it not be easier to say it needs to match NCName plus these other restrictions? I'm not sure introducing a whole new production is helpful here. |
Hmm, I thought a production would be much easier to read/code against than taking a production and then using prose. The other restrictions get pretty hairy to the extent the new production is not really recognizable as a NCName. |
The main thing is that browsers have code for an "NCName" check and everyone is vaguely familiar with it given createElementNS (and it's only a character different from createElement). So placing additional requirements beyond "NCName" makes it
|
I've added a non-normative note that should make it easier to see the delta. Hope that's clearer. |
Thanks, that helps. |
Why are we so worried about custom element names conflicting with possible future tag names? (Not you specifically @rniwa I propose that we should be allowed to override any element we wish, and in a per-shadow-root basis (not just on document): // file1.js
import AwesomeImageElement from 'awesome-img'
const el = document.querySelector('#someEl')
const root = el.createShadowRoot()
root.registerElement('img', AwesomeImageElement)
const img = root.createElement('img') // creates an AwesomeImageElement instance
root.appendChild(img) // file2.js
const el = document.querySelector('#otherEl')
const root = el.createShadowRoot()
const img = root.createElement('img') // creates an HTMLImageElement instance
root.appendChild(img) If we allow overriding of native elements, then there will be no problem introducing native elements in the future; existing apps will continue to work, having their custom elements in place. It will also give developers more freedom and flexibility. Please see the following threads for more details and examples: |
@trusktr this comment is not really relevant to this issue. Having already raised the issue in question, please keep the technical discussion there, and avoid filling other issues with repeats of that information. Places like twitter, blogs, and public discussions of ideas are other relevant places to look for support or discussion of your proposal. Filling up issue discussion isn't. (If it were more relevant, a simple pointer would be enough. In this case, even that would probably be spammy). Chaals (as chair) |
Hello @chaals, thanks for the tip! |
Instead of requiring a name to match an existing HTML element, this relaxes the restrictions to: - starting with [a-zA-Z] (matching the HTML parser WICG/webcomponents#239 (comment)) - then continuing with anything other than a space, forward slash or closing angle bracket This is similar to the fix to the following issue in the HTML syntax highlighting repo (and actually depends on the "derivative" syntax that was created for that issue): textmate/html.tmbundle#92
Title: [Custom]: "are" custom element names ASCII characters, or MUST they be ASCII characters? (bugzilla: 22056)
Migrated from: https://www.w3.org/Bugs/Public/show_bug.cgi?id=22056
comment: 0
comment_url: https://www.w3.org/Bugs/Public/show_bug.cgi?id=22056#c0
Dominic Cooney wrote on 2013-05-16 06:29:42 +0000.
"is a sequence of alphanumeric ASCII characters"
This is confusing. NCName [1] includes combining characters and extenders that are not ASCII characters. These should be allowed, because custom element names MUST match the NCName production and there is no restriction on the character set.
I think "is a sequence of alphanumeric ASCII characters" should be "MUST be a sequence of ASCII characters".
[1] http://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName\
The text was updated successfully, but these errors were encountered: