-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New and old parse errors #107
Comments
Looking at #92 again, which introduced these:
So I think in principle |
@gsnedders great, thanks! That eliminates about 250 test failures for me, 120 to go! |
Named character references in attributes whose last character is not `;` and for which the next input character is `=` (or ASCII alphanumeric, but this isn't tested here), flushes the code points consumed as a character reference _without_ adding a parse error. Named character references not in attributes whose last character is not `;` are errors, regardless of the following character as noted in the `#new-errors` section but without an entry in `#errors`, the number of errors are wrong. (See html5lib#107). Separately, this adds the missing expected-doctype-but-got-start-tag error.
@gsnedders Is #113 the right approach here? Two of the errors aren't errors (any longer?) and it looks like one is now but wasn't before. It's also possible that the error logic didn't actually flip as that PR would indicate and instead the test was just wrong before. |
@stevecheckoway FWIW the error data is by far the most likely to be wrong bit of data in the testsuite, because very few implementations use it, so it's entirely plausible that the test was just wrong |
@gsnedders Makes sense. Without standardized error names (at least until now), it's pretty tricky to test that the errors are correct. Number of errors is a pretty weak proxy so I can see why people wouldn't bother. For what it's worth, those eight PRs resolve about half of the 120 test failures from different numbers of errors I'm getting when running against the |
If the `#errors` section should have the same number of lines as errors (see html5lib#107), then the NULL-character errors need to be accounted for.
Named character references in attributes whose last character is not `;` and for which the next input character is `=` (or ASCII alphanumeric, but this isn't tested here), flushes the code points consumed as a character reference _without_ adding a parse error. Named character references not in attributes whose last character is not `;` are errors, regardless of the following character as noted in the `#new-errors` section but without an entry in `#errors`, the number of errors are wrong. (See html5lib#107). Separately, this adds the missing expected-doctype-but-got-start-tag error.
If the `#errors` section should have the same number of lines as errors (see html5lib#107), then the NULL-character errors need to be accounted for.
Any updates on updating the errors in the test to reflect the spec? The tree-construction stage still doesn't have its own error codes, unfortunately, but it'd be nice for the tests to at least have the correct number of errors. |
Named character references in attributes whose last character is not `;` and for which the next input character is `=` (or ASCII alphanumeric, but this isn't tested here), flushes the code points consumed as a character reference _without_ adding a parse error. Named character references not in attributes whose last character is not `;` are errors, regardless of the following character as noted in the `#new-errors` section but without an entry in `#errors`, the number of errors are wrong. (See html5lib#107). Separately, this adds the missing expected-doctype-but-got-start-tag error.
If the `#errors` section should have the same number of lines as errors (see html5lib#107), then the NULL-character errors need to be accounted for.
Named character references in attributes whose last character is not `;` and for which the next input character is `=` (or ASCII alphanumeric, but this isn't tested here), flushes the code points consumed as a character reference _without_ adding a parse error. Named character references not in attributes whose last character is not `;` are errors, regardless of the following character as noted in the `#new-errors` section but without an entry in `#errors`, the number of errors are wrong. (See #107). Separately, this adds the missing expected-doctype-but-got-start-tag error.
If the `#errors` section should have the same number of lines as errors (see #107), then the NULL-character errors need to be accounted for.
I'm trying to test that an HTML parser produces the correct errors but some of the tests list
#errors
and#new-errors
where the new errors seem to be the new standard names for old errors. For example,I'm not sure why the first error is there, but I assume an older version of the standard had something like a look ahead on the
<
. As I read the standard, there should be anunexpected-question-mark-instead-of-tag-name
and a (currently unnamed in the standard) parse error for the missing DOCTYPE.Is it always the case that if there's a new error that it replaces exactly one of the old errors? If not, how should these be handled?
The text was updated successfully, but these errors were encountered: