Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Test262 #622

Closed
wants to merge 4 commits into from
Closed

Integrate Test262 #622

wants to merge 4 commits into from

Conversation

jugglinmike
Copy link

Integrate Test262

This patch is intended to resolve gh-510. It introduces three new dependencies:

  • Test262 (fetched by git SHA via npm)
  • test262-stream - A Node.js API for traversing the Test262 test suite (this creates the test code based on the contents of the Test262 directory)
  • results-interpreter - A Node.js transform stream for interpreting streaming test results (this implements the code necessary to interpret the results in terms of a whitelist file)

These projects were created to reduce duplication (both in code and in maintenance effort) between Test262 consumers. They're based on my experience integrating that project with JSHint and Babylon. Referring to Test262 via a git SHA allows this project's maintainers to control when the tests should be updated.

For this initial version, I've simply appended the test execution to the project's existing npm test command. On my system, Acorn takes about 40 seconds to run all the tests, and I'm not sure if that's "fast enough" to warrant its inclusion here. Alternatively, we could define a dedicated npm script and configure the CI environment to run it in a separate job. I'll hold off on any optimizations until I hear back from the maintainers.


Retrieve the Test262 project from its canonical location on GitHub.com.
Use the test262-stream module to create tests in accordance with that
project's interpretation guidelines. Use the results-interpreter
project to maintain a "whitelist" file of tests which Acorn is currently
expected to fail.

Retrieve the Test262 project from its canonical location on GitHub.com.
Use the `test262-stream` module to create tests in accordance with that
project's interpretation guidelines. Use the `results-interpreter`
project to maintain a "whitelist" file of tests which Acorn is currently
expected to fail.
@adrianheine
Copy link
Member

Hi, thanks for the PR! A few observations:

  • I don't think we want to run that as part of the normal tests, since they are really fast and already cover a lot. I think we will rather add new cases to the normal test suite where necessary. Running test262 on the CI is great, though.
  • You added all failing tests to the white-list, while I skipped all tests with features that acorn does not support on purpose. The latter is much faster and less arbitrary. Tests on the white-list actually correspond to bugs in my PR.

@jugglinmike
Copy link
Author

I don't think we want to run that as part of the normal tests, since they are
really fast and already cover a lot. I think we will rather add new cases to
the normal test suite where necessary. Running test262 on the CI is great,
though.

Got it. I've reverted my changes to the test npm script and inserted a new script, test-262. I've updated the project's .travis.yml file to use the "Build Matrix" feature. This will cause TravisCI to run npm test in all the Node.js environments which it previously ran that command, and it will also run Test262 in dedicated build using Node.js version 8 specifically.

You added all failing tests to the white-list, while I skipped all tests with
features that acorn does not support on purpose. The latter is much faster
and less arbitrary. Tests on the white-list actually correspond to bugs in my
PR.

I've implemented equivalent behavior using the list of excluded feature flags from your patch. Note that this patch's whitelist is much smaller. I've compare the console output from both, and I think the other version has some extra entries.

Here's the output from gh-615;

Testing complete.
Summary:
 ✔ 43747 valid programs parsed without error
 ✔ 2713 invalid programs produced a parsing error
 ✔ 352 invalid programs did not produce a parsing error (and allowed by the whitelist file)
 ✔ 109 valid programs produced a parsing error (and allowed by the whitelist file)
 ✔ 9888 programs were skipped

And here's the output from this patch:

Testing complete.
Summary:
 ✔ 43747 valid programs parsed without error
 ✔ 2713 invalid programs produced a parsing error
 ✔ 352 invalid programs did not produce a parsing error (and allowed by the whitelist file)
 ✔ 109 valid programs produced a parsing error (and allowed by the whitelist file)

Finally, I've modified the git URL used to reference Test262 to ensure that it can be successfully retrieved from the TravisCI environment.

Thanks for the review!

@jugglinmike
Copy link
Author

@adrianheine Thinking a bit more about this, I'm wondering if it would be
better to run the tests for those feature flags despite the lack of support. As
you've pointed out, this will introduce a number of additional entries in the
whitelist file. It's true that these entries won't describe bugs in Acorn, but
that fact can be communicated in the whitelist file itself using dedicated
comments.

The reason I feel this might be a better solution is that it will help with
maintenance in the long term. I'm thinking of the day when (for instance)
BigInt reaches stage 4, and Acorn implements support for it. At that time, the
folks involved with the project may forget that the appropriate Test262 tests
have been skipped. They will of course add Acorn-specific tests, but they will
move on without the benefit of the now-relevant Test262 tests.

If we extend the whitelist with expected failures here today, then when support
is added, the new Test262 "job" will fail on TravisCI (since tests that we
marked as "expected to fail" will begin to pass). At that point, they can be
automatically removed using the --update-whitelist flag, and from then on,
Acorn will move forward enjoying any additional coverage provided by Test262.

The only drawback to this is that every time Acorn updates its version of
Test262, there may be new expected failures to account for. This is a bit of an
additional maintenance burden, but it is largely mitigated by that same
--update-whitelist flag.

What do you think?

@jugglinmike
Copy link
Author

Hi there Kenny! I should start by saying that this is an attempt to satisfy a
request of the project maintainers (see gh-510), so they may have different
reasons for wanting this. But I'm happy to share my own perspective in case
you're curious.

Many of the tests are for features that have not yet reached stage 4. Other
projects have different requirements when it comes to implementation, so they
are particularly interested in those early-stage tests. As we've been
discussing above, it is Acorn's prerogative to whitelist (or even skip
outright) whatever class of tests aren't appropriate for its goals.

All the tests include meta-data describing how implementers should interpret
them. You can read more about this in the file named INTERPRETING.md of the
Test262 project, but the test262-stream module was designed so that folks
wouldn't have to study that so closely to benefit from Test262.

A vast majority of the tests are for runtime semantics. From a parser's
perspective, the semantics are impossible to verify. But that test material is
still valuable as a sort of "positive syntax" test.

When it comes to triaging issues from confused contributors, it's probably best
to take a "wait and see" approach. The situation you've described sounds like a
problem that could be addressed by clear documentation, but we'll be in a
better position to write that if/when we learn what is confusing about the
whitelist file.

Regarding the unfound files: it looks as though you are running the tests from
a Windows machine. Because the whitelist hard-codes file paths using the POSIX
separator (the forward slash character--/), then the paths generated by folks
running from Windows will not match up. I think we can resolve this in the
test-262.js script's ID generation logic--something like .replace(/\\/g, '/'). I can experiment from a Windows VM this weekend.

@jugglinmike
Copy link
Author

Hi @isiahmeadows. I'm wondering if/how our recent conversation reflects your interest in running Test262 within Acorn. Your participation has already set some improvements in motion, and I'm sure that formally integrating the test suite in Acorn will lead to still more. Actually, that's one of my reasons for making this contribution: many ECMAScript interpreters (both runtimes and parsers) stand to benefit from Acorn's participation. Is there anything more I can do to help see this through to master?

@dead-claudia
Copy link
Contributor

@jugglinmike You're probably talking to the wrong person. I have little to no involvement in any active parser project; I only wrote the script for a friend, and it was my troubles in adapting the tests that spurred me to file the Test262 issue.

The script itself in the gist, linked to from that issue, does come with basic instructions on how to integrate it. It's pretty straightforward.

@jugglinmike
Copy link
Author

Oh, my mistake! I guess maybe we should hear from @marijnh then.

Marijn: this is my attempt to resolve gh-510. Isiah has identified a few patterns in Test262 where parsing tests are expressed with eval. The effect on Acorn is that those tests don't improve coverage. They are still technically correct from a parsing standpoint, however. Because of this, I don't think either Isiah or myself consider the issue to be a blocker (please correct me if I'm wrong there, Isiah). The test execution code is configured to tolerate failures based on a whitelist, and the efficacy of this strategy is proven by the new "job" in the project's continuous integration system.

What do you think?

@adrianheine
Copy link
Member

adrianheine commented Dec 6, 2017

As it currently stands, this implementation takes 3-4 times as long as #615, but this could be fixed with bocoup/test262-stream#2. I still think skipping is better than whitelisting for tests for features acorn does not support, but I see your point with regards to not forgetting to remove the skip once the feature is implemented. I would just hope that people keep these tests in mind. Aside from that there is not much of a difference between both pull requests.

@dead-claudia
Copy link
Contributor

dead-claudia commented Dec 6, 2017

@adrianheine For context, the script that's being referenced here is in this gist. And although it wasn't mentioned here, there is an option to skip various features as well as an option to skip entire globs (relative to the test262 directory). Just thought I'd clear that up.

@adrianheine
Copy link
Member

I was referring to this pull request, not your script. Thanks for the link, though.

@marijnh
Copy link
Member

marijnh commented Dec 28, 2017

I've decided to go with #615 instead, since it appears to involve less new moving parts.

@marijnh marijnh closed this Dec 28, 2017
@jugglinmike
Copy link
Author

Yeah, I'm interested in limiting complexity, too. I recently submitted a patch to halve the dependencies in test262-stream (upon which Adrian's implementation now relies). Like any programmer, I'd like to limit duplication in code and in effort, but the important thing is that Acorn is running Test262! That's sure to make both projects better (and by extension, all the other consumers of Test262). I'll keep you posted about advances in test262-stream that this project could benefit from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Test262 parser tests into this repo?
4 participants