Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing test262 in different (web platform) agents #8308

Open
annevk opened this issue Nov 20, 2017 · 17 comments
Open

Testing test262 in different (web platform) agents #8308

annevk opened this issue Nov 20, 2017 · 17 comments
Labels

Comments

@annevk
Copy link
Member

annevk commented Nov 20, 2017

Over in https://github.com/dpino/gecko-dev @dpino is working on ensuring that various SharedArrayBuffer tests from test262 run across the various agents defined by the web platform. A follow-up goal is to run all test262 tests across the various agents to ensure there's no weird bugs in the various JavaScript engines.

The idea is to host this "wrapper test suite" in web-platform-tests so all user agents can benefit.

If anyone has thoughts, ideas, or concerns that'd be great to hear.

cc @jgraham @domenic @foolip @ljharb @leobalter

(Corresponding issue: dpino/gecko-dev#21.)

@annevk annevk added the js label Nov 20, 2017
@foolip
Copy link
Member

foolip commented Nov 20, 2017

Is it important that everyone that uses web-platform-tests also gets test262 as part of it, or would it suffice if the tests are run on the same setup as for wpt.fyi and published either on wpt.fyi or a test262 results dashboard?

@annevk
Copy link
Member Author

annevk commented Nov 20, 2017

As I understand it test262 attempts to be host-agnostic, just like ECMAScript itself. So while the web platform has many agents, other hosts might just have one. So if we want to run those tests in a window, a worker, a shared worker, or a combination thereof (in case of SharedArrayBuffer), etc. I think that has to happen on the web-platform-tests side.

Various JavaScript engines can also run test262 directly, but that doesn't exercise quite the same code paths as running them through web platform agents.

@foolip
Copy link
Member

foolip commented Nov 20, 2017

Oh, so you're saying we'd run the tests at least in a window and worker context?

@annevk
Copy link
Member Author

annevk commented Nov 20, 2017

@foolip ideally all agents, including worklets (though only possible for audio worklets I think), service workers, and shared workers. That's the long term goal.

The short term goal is making sure SharedArrayBuffer tests are tested across all agent combinations, which similarly requires this kind of wrapper setup.

@annevk
Copy link
Member Author

annevk commented Nov 20, 2017

@foolip
Copy link
Member

foolip commented Nov 20, 2017

Thanks, that does help. Seems like a good start would be to pick a browser, write a wrapper for wpt, and run the test against the similar-origin window agent using wpt run. See if there are any differences to the results from the same tests run against the engine JS directly. Then also run against the other agents and see what other differences show up.

Most likely, new bugs will be revealed. Depending on how many bugs, the tradeoff between running the same tests many times vs. finding bugs might look different.

How long does it currently take to run all of the tests?

@dpino
Copy link
Contributor

dpino commented Nov 20, 2017

Hi @foolip . I coded an attempt to do what you suggested at dpino/gecko-dev@9641de0

Basically it's a Perl script that prints out a WPT test with a customized list of test262's tests to run. In that commit I'm only supporting a DedicatedWorker, although it could be extended for other types of workers. The main issue with this approach was that it required to write wrappers for things that test262 uses (for instance the assert commands are slightly different than what WPT supports) and more importantly when I tried to build up a long list of tests to run, the whole test timeout.

I think this approach, although it can be interesting to try out test262's in a browser, it does not sound like the right approach, dunno.

Unfortunately the laptop I was using to do this work crashed today so I cannot check how long time takes to run the whole test262 or wpt suite. I will post those numbers once I my laptop gets fixed (luckily in one day or two).

@jgraham
Copy link
Contributor

jgraham commented Nov 20, 2017

I think this makes sense. I think WASM might do something similar. The details of how the integration should work are unclear to me; how will web-platform-tests be kept in sync with the test262 tests?

@dpino
Copy link
Contributor

dpino commented Nov 28, 2017

@foolip Running all test262 in my laptop takes around 4 min. Not a very useful information. I suppose you were asking for the time spent running the all tests as part of a CI infrastructure or similar.

$ ./tests/jstests.py build_OPT.OBJ/dist/bin/js test262
[27706|    0|    0| 1041] 100% ======================================>| 233.0s
PASS

I gave a try at your suggestion (a wrapper that relies on wpt run to launch a test262 in the browser). I pushed the changes to a remote branch at: https://github.com/dpino/web-platform-tests/tree/test262-runner

I have several questions regarding web-platform-tests. Ideally the way I think the test262 suite should be run is by opening a browser and run all the tests in the same instantiated browser. With the wrapper above, each launch of a test opens/closes a new browser, therefore taking a very long time to run the whole suite (even more since the suite should run on different agents). Another approach could be to group several 262 tests together into a single WPT. I don't know if it would be possible to have one single instance of a browser where every test is run and the browser communicates back the results to the command shell.

@jgraham
Copy link
Contributor

jgraham commented Nov 28, 2017

web-platform-tests generally work with one instance of the browser running multiple tests.

The most obvious way to do this integration would be to generate testharness.js wrappers for the test262 tests and check in the generated files. These would then run like any other testharness.js test. It looks like that's more or less what's on your branch, but you don't add all the files at once, and call wpt run for every test rather than once.

There are more complex solutions we could imagine in which the templates are baked into the server like with .worker.js files. I don't know if that's worthwhile.

@dpino
Copy link
Contributor

dpino commented Dec 13, 2017

Thanks @jgraham for the clarification. Initially I thought web-platforms-test launched a new browser per test, but I was wrong.

I've updated the script quite a bit. Now I just use the script to generate the WPT wrappers from test262 test files and run them external as normal WPT tests.

OTOH, some of the tests were failing or timeout. The issue was that some 262 tests modify builtin objects such as Array and that had a collateral effect on the web-platform-test harnessing code. So I actually need to parse the source of the test and add code to undo the change once the test is over. Anyway, still struggling with this.

@annevk
Copy link
Member Author

annevk commented Dec 13, 2017

Perhaps an alternative approach is to load the 262 test in an <iframe> and then use onload to inspect the result? Might not be as nice though and come to think of it would not work in a worker and such. Seems those kind of tests would be rather hard to do properly with a harness.

@dpino
Copy link
Contributor

dpino commented Dec 13, 2017

@annevk I can give a try to run the test in an iframe, at least for same-origin window, and see if I got more tests passing. Right now launching test262/builtins directory, which is the largest 262 directory, I got 1000 tests failing and 35 timeouts. Maybe some tests fail due to a missing JS shell feature in the browser (not all of them are implemented yet) or so. I would need to look more into the failing tests.

The good thing of running the test in the browser as web-platform-tests is reusing all the infrastructure for running tests and retrieving reports. But everything that has to do with instrumentation (Sellenium/Marionette) is actually not useful for this case IMHO. @jugglinmike told me about https://github.com/bterlson/test262-harness that is a node.js tool for running test262 in the browser (there's also https://github.com/bakkot/test262-web-runner). So maybe a similar tool that uses a web-socket to communicate the results from the browser to a server process could be another approach. I don't know. Does it make sense? For the moment, I'm going to keep trying this approach.

@annevk
Copy link
Member Author

annevk commented Dec 13, 2017

I'm not sure, I'm not familiar enough with all the harnesses. I'm curious if @bakkot has looked into running test262 in a worker environment.

@dpino
Copy link
Contributor

dpino commented Jan 4, 2018

I have first version of the tests running. I reworked the script to run the tests inside an IFrame. Then I added support for other agents: child Window, DedicatedWorker and SharedWorker. ServiceWorker is not supported yet, more on that later.

I used the results of Test262-Web-Runner as a baseline to compare the results I got. I run the tests on Firefox Nightly 59.a1. First of all, here are the results for Test262-Web-Runner:

Test262-Web-Runner

Test Ran Failed
annexB 977/1003 26
built-ins 12743/13446 (skipped 32) 703 + 32
harness 94/94 0
intl402 231/236 5
language 13917/14822 905

And here are the results of the web-platform-tests's wrappers for test262 (only IFrame in this benchmark):

Test Ran Expected results Failed
annexB Ran 2263 tests (1003 parents, 1260 subtests) 2230 33 (FAIL: 33)
built-ins Ran 40188 tests (13478 parents, 26710 subtests) 38748 1440 (FAIL: 1440)
harness Ran 275 tests (94 parents, 181 subtests) 275 0
intl402 Ran 708 tests (236 parents, 472 subtests) 698 10 (FAIL: 10)
language Ran 43243 tests (14898 parents, 28345 subtests) 41559 1684 (FAIL: 1684)

This summary cannot be compared directly with the results of Test262-Web-Runner. By default, 262's tests are executed both in strict mode and non-strict mode, unless a tag (onlyStrict, noStrict) indicates otherwise. For each test in the WPT wrapper, two actual tests are run normally. So when a test fails likely that counts as two failing tests. On the other hand, a test that fails in Test2-Web-Runner counts only once.

So to actually compare the WPT results and Test262-Web-Runner I need to normalize the results using an expression like the following:

$ grep "FAIL IFrame" annexB.output | cut -d : -f 2 | sort -u | wc -l

Here are the normalized results for IFrame:

Test Ran Failed
annexB 1003 26
built-ins 13446 720
harness 94 0
intl402 236 5
language 14822 827

The results are almost the same as Test262-Web-Runner (I just noticed the results for 'language' are much worse, although I used to get better results in other runs. I will look into that **) . Then I started to add support for the other agents. I paste the results for each type of agent:

** 08/01/2018: The values are updated now.

Window

Test Ran Failed
annexB 1003 26
built-ins 13446 720
harness 94 0
intl402 236 5
language 14822 833

Worker

Test Ran Failed
annexB 1003 69
built-ins 13446 1043
harness 94 1
intl402 236 6
language 14822 3827

SharedWorker

Test Ran Failed
annexB 1003 69
built-ins 13446 1059
harness 94 2
intl402 236 6
language 14822 3907

Regarding ServiceWorker, the reason I left it out for the moment is that for the currently supported agents I generate the tests on-the-fly (either an HTML page for IFrame and Window or a JavaScript file for DedicatedWorker and SharedWorker) using a Blob object. However, it's not possible to generate ServiceWorkers on-the-fly for security reasons. One possible work around would be to generate the ServiceWorker files for each test beforehand. The con is that that would duplicate the number of total files but I think it would work.

@dpino
Copy link
Contributor

dpino commented Jan 8, 2018

I fixed the issue that affected the results of the 'language' block test. The values are updated now.

@dpino
Copy link
Contributor

dpino commented Jan 10, 2018

I have pushed a PR with the script to generate the WPT wrappers as well as the harnessing code to run the tests. The PR is not ready to be merged yet, but I think it can be a starting point to get feedback and discuss what's pending to be done. PTAL #8980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants