-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test the encoding sniffing algorithm (aka meta prescan) #130
base: master
Are you sure you want to change the base?
Test the encoding sniffing algorithm (aka meta prescan) #130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a part of me that would rather we just add a #preparsed
section to the existing tests as and when it differs from the final state from the parser, as in the vast majority of cases they can be used for both the pre-parser and the whole system.
(also does scripting support matter for testing encoding? how do document.write
and <noscript>
interact here? because it so we should make sure we have expectations as required)
This change adds a `preparsed` subdirectory in the `encoding` directory, with tests for which the result of the *encoding sniffing algorithm* at https://html.spec.whatwg.org/#encoding-sniffing-algorithm is the expected result — that is, tests for which the expected result is the output of running *only* the encoding sniffing algorithm (of which the main sub-algorithm is the so-called “meta prescan”) — without also running the tokenization state machine and tree-construction stage. This change also adds a README file that explicitly documents what the expected results for the encoding tests are, based on whether or not they’re in the `preparsed` subdirectory. Without those changes, it’s unclear whether the expected results shown in the existing tests are for the output of fully parsing the test data — through the tokenization state machine and tree-construction stage — or instead just the output of the encoding sniffing algorithm only. And without those changes, we also don’t have any tests a system can use for testing only the output from the encoding sniffing algorithm. Fixes #28
bb92134
to
b6c4e3f
Compare
I thought about that too, but I think that’s less preferable because it would push off more work onto the N downstream parser implementors who’d all need to update their test-data parsers in their test harnesses — to teach the testharness about that new In contrast, putting the tests with new test expectations into a new subdirectory allows those N downstream parser implementors to just keep running their existing test harnesses on the existing test data with zero changes needed. And for handling the added
I think that question is orthogonal to the scope of this PR. But it seems to me the answer is that for the preparsed case, scripting support definitely does not matter. For the “normal” parsed case, I don’t the answer. But since the implementation I work on doesn’t have scripting support anyway, I’m not personally super-motivated to spend time investigating it… |
How much work is that for most downstream parsers?
I think the majority do do recursion?
It is. Just wondering if we might potentially want further flags later, because any way we do this has a cost to some people hence it would be nice to have it happen all at once. |
For the test harness we’re using the validator.nu parser, I guess it’d be an hour of work, versus minutes of work if we just put the tests into a different directory. But that said, it’s not absolutely a lot of work. However, I think what’s more important to consider than how much work it takes is instead, whether it will break anybody’s existing test harnesses. And in that case I can say that adding a new So I personally would prefer not to make changes that have a higher risk of breaking somebody else’s harness code, and instead mitigate that risk by adding the new test types to the existing test suite in a way that seems less likely to break anybody’s existing harness. I also recognize that it would not be a lot of work for others to update their harness code to recognize a new section name. But I would prefer not to make the choice for them unilaterally.
So even if they’re doing recursion, the relatively minor change that anybody could make for now — short of adding full support for the new test types — is to write a single one-to-three-line condition that just causes their harness to completely ignore the For the case of the validator.nu parser now, that’s what we’re already doing for the So I guess I personally would be less happy if the decision had been made previously to add a |
@inikulin Given you’re the main person who’s been most active in this repo most recently, it’d be helpful if you weigh in with an opinion on this patch. Specifically, would it better to add a new |
This change adds a
preparsed
subdirectory in theencoding
directory, with tests for which the result of the encoding sniffing algorithm at https://html.spec.whatwg.org/#encoding-sniffing-algorithm is the expected result — that is, tests for which the expected result is the output of running only the encoding sniffing algorithm (of which the main sub-algorithm is the so-called “meta prescan”) — without also running the tokenization state machine and tree-construction stage.This change also adds a README file that explicitly documents what the expected results for the encoding tests are, based on whether or not they’re in the
preparsed
subdirectory.Without those changes, it’s unclear whether the expected results shown in the existing tests are for the output of fully parsing the test data — through the tokenization state machine and tree-construction stage — or instead just the output of the encoding sniffing algorithm only. And without those changes, we also don’t have any tests a system can use for testing only the output from the encoding sniffing algorithm.
Fixes #28