-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unique ids to each parse error #1339
Comments
Sounds like a good idea. (I also think the parser could use more ids in general.) |
I'm OK with this. |
I've gone through the microsyntax parse errors first. There are not very many but these are the hardest to properly identify. With one exception they can all fit the 'x-in-y' pattern. What do you think of these so far?
Example: Next I'll see about listing the generalized forms for the remaining parse errors. |
I think we should consider actually uniquely naming each parse error in the body of the spec; like: <code><dfn>CommaInImgSrcset</dfn></code><span
data-x="concept-microsyntax-parse-error">parse error</span> |
Gah, we keep giving poor @haroldfredshort conflicting information :(... |
Maybe people would all agree on something like ... that is a <span data-x="concept-microsyntax-parse-error">parse error</span>
(<dfn data-x="parse-error-comma-in-img-srcset">comma in img srcset</dfn>). ??? |
oofs, yeah, sorry for not re-reading the comments in the PR more carefully
... that is a <span data-x="concept-microsyntax-parse-error">parse error</span>
(<dfn data-x="parse-error-comma-in-img-srcset">comma in img srcset</dfn>). Yeah, that would be fine by me. |
I believe this bug is about parse errors in the "Parsing HTML documents" section -- not about errors in |
Regarding the comment from @zcorpan, should I not add identifiers to the seven srcset parse errors, or would it be okay to identify these differently? Like this:
The parse errors in the "Parsing HTML documents" section are simpler, but before I go further I am including an example here to make sure we are all agreed that this is how we want it to be:
|
The srcset parse errors are not errors that get implemented by HTML parsers. They instead are targeted just to the separate specific code that browsers implement in their So it would be good to have IDs for those as well, but those should be in a separate PR, because this issue was raised specifically just for the parse errors for HTML parsers as defined in the HTML parsing algorithm, and so the target is just HTML parser developers/implementations. |
@sideshowbarker Okay, thank you for the clarification. |
I think each of the HTML parse errors can be made to fit into one of the following generalized error forms: Please let me know what you think. |
I think a few of the entity parse errors might not quite fit into that, but am open to suggestions there (also, it's only three or four, so that's not really wasted effort if we decide to change them after you've made a PR!). I'm happy to go with @domenic's suggestion as to the format of the |
(Sorry for that having made this bug sufficiently clear before, by the way!) |
Is anyone working on it? If not, I would like to champion it if no one would mind. |
I've given up, go for it!
…On Sat, Mar 11, 2017, 11:11 Ivan Nikulin ***@***.***> wrote:
Is anyone working on it? If not, I would like to champion it if no one
would mind.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1339 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAa9GODVj5UhOLj_2mdr2RM4ZOMQHaU2ks5rkvHKgaJpZM4IpJlq>
.
|
Unique id per error in the HTML parsing section seems like a good idea. When a human-facing message would be parametrized with the particular element names at hand, those variables varying shouldn't cause the error identity to be multiplied. |
OK, so here is the plan: I'll stick with error format proposed by @domenic in #1339 (comment) with
@caridy and @diervo have expressed their desire to participate: feel free to join once I'll sort out bootstrapping-related things (first point in the TODO-list) and we'll coordinate our work on this. Does anyone has any concerns before I start? |
@inikulin FWIW, there was various bits of debate about parse error format in html5lib-tests, especially giving positions (some people wanted tests for start/end positions of the token with the error, IIRC, so whatever format we have there should make sure it's extensible for that). On the whole I'm in favour of something like |
@gsnedders I'm not sure we'll always be able associate error with the token correctly. However, from the top of my mind following should work in most cases:
What do you mean by |
@inikulin currently I think some have for tree construction |
@gsnedders yeah, makes more sense. |
@inikulin I spend the weekend reading all the threads and docs about the topic, so I'm ready to help with the implementation. One you have some idea/design draft with the first integration and implementation details, I will happily help with anything you need. This is something we really need soon for our project at Salesforce so I would gladly co-champion it with someone that has more background with the spec related stuff. |
OK, I've finished preparation steps and and finally got my hands on the spec. Here is how it looks like: I'm a bit hesitating if it's clear enough that what given in the parentheses is a error code. Maybe prefix form suggested in #1339 (comment) will work better? Or we could just clarify this somehow in the parse error definition? |
Seems like another option is to put the error name in front of the words parse error, like this:
|
Yeah, I personally like the prefix form, with @sideshowbarker's most recent variation seeming like an even nicer variant than the earlier comment. Very excited about this! We should probably also add something about the purpose of these identifiers in the parse error definition section. |
I wonder if we should enforce usage of specification-provided error codes, i.e. should we use "should" or "may" RFC 2119 keywords in:
|
I’m supportive of using a normative “should” to state that requirement for conformance checkers. |
This gives every parse error that occurs during tokenization a unique ID, and adds non-normative text explaining and exemplifying when they occur in an overview table. Part of #1339; tree construction parse errors remain before that issue is finished.
When doing the next round to add codes to the tree builder, please also add a comma and space after "e.g." here: Edit: Did this in #2728 |
@zcorpan Thank you! |
This gives every parse error that occurs during tokenization a unique ID, and adds non-normative text explaining and exemplifying when they occur in an overview table. Part of whatwg#1339; tree construction parse errors remain before that issue is finished.
It would be convenient for the sake of testing implementations to have unique ids on each parse-error.
I know @Hixie opposed this because the spec only requires parse errors to emitted at specific points, and some implementations merge multiple parse errors in the spec, and hence cannot distinguish them.
I don't think this is a good argument against it, as implementations can always have a mapping of their parse error x corresponds to a and b in the spec.
At the moment html5lib-tests essentially has position and some meaningless message.
Both html5lib and html5ever have unique messages for each parse error. It would be nice to make sure we have the right errors being raised.
We probably want ids something like the keys in the dictionary
E
in https://github.com/html5lib/html5lib-python/blob/master/html5lib/constants.py#L7.The text was updated successfully, but these errors were encountered: