Testing test262 in different (web platform) agents #8308

annevk · 2017-11-20T11:56:07Z

Over in https://github.com/dpino/gecko-dev @dpino is working on ensuring that various SharedArrayBuffer tests from test262 run across the various agents defined by the web platform. A follow-up goal is to run all test262 tests across the various agents to ensure there's no weird bugs in the various JavaScript engines.

The idea is to host this "wrapper test suite" in web-platform-tests so all user agents can benefit.

If anyone has thoughts, ideas, or concerns that'd be great to hear.

cc @jgraham @domenic @foolip @ljharb @leobalter

(Corresponding issue: dpino/gecko-dev#21.)

The text was updated successfully, but these errors were encountered:

foolip · 2017-11-20T12:12:21Z

Is it important that everyone that uses web-platform-tests also gets test262 as part of it, or would it suffice if the tests are run on the same setup as for wpt.fyi and published either on wpt.fyi or a test262 results dashboard?

annevk · 2017-11-20T12:16:49Z

As I understand it test262 attempts to be host-agnostic, just like ECMAScript itself. So while the web platform has many agents, other hosts might just have one. So if we want to run those tests in a window, a worker, a shared worker, or a combination thereof (in case of SharedArrayBuffer), etc. I think that has to happen on the web-platform-tests side.

Various JavaScript engines can also run test262 directly, but that doesn't exercise quite the same code paths as running them through web platform agents.

foolip · 2017-11-20T12:17:56Z

Oh, so you're saying we'd run the tests at least in a window and worker context?

annevk · 2017-11-20T12:20:23Z

@foolip ideally all agents, including worklets (though only possible for audio worklets I think), service workers, and shared workers. That's the long term goal.

The short term goal is making sure SharedArrayBuffer tests are tested across all agent combinations, which similarly requires this kind of wrapper setup.

annevk · 2017-11-20T12:22:27Z

Perhaps reading https://gist.github.com/annevk/b15a0a9522d65c98b28fb8c6da9f0ae5 helps.

foolip · 2017-11-20T12:29:16Z

Thanks, that does help. Seems like a good start would be to pick a browser, write a wrapper for wpt, and run the test against the similar-origin window agent using wpt run. See if there are any differences to the results from the same tests run against the engine JS directly. Then also run against the other agents and see what other differences show up.

Most likely, new bugs will be revealed. Depending on how many bugs, the tradeoff between running the same tests many times vs. finding bugs might look different.

How long does it currently take to run all of the tests?

dpino · 2017-11-20T12:49:50Z

Hi @foolip . I coded an attempt to do what you suggested at dpino/gecko-dev@9641de0

Basically it's a Perl script that prints out a WPT test with a customized list of test262's tests to run. In that commit I'm only supporting a DedicatedWorker, although it could be extended for other types of workers. The main issue with this approach was that it required to write wrappers for things that test262 uses (for instance the assert commands are slightly different than what WPT supports) and more importantly when I tried to build up a long list of tests to run, the whole test timeout.

I think this approach, although it can be interesting to try out test262's in a browser, it does not sound like the right approach, dunno.

Unfortunately the laptop I was using to do this work crashed today so I cannot check how long time takes to run the whole test262 or wpt suite. I will post those numbers once I my laptop gets fixed (luckily in one day or two).

jgraham · 2017-11-20T13:03:29Z

I think this makes sense. I think WASM might do something similar. The details of how the integration should work are unclear to me; how will web-platform-tests be kept in sync with the test262 tests?

dpino · 2017-11-28T18:38:31Z

@foolip Running all test262 in my laptop takes around 4 min. Not a very useful information. I suppose you were asking for the time spent running the all tests as part of a CI infrastructure or similar.

$ ./tests/jstests.py build_OPT.OBJ/dist/bin/js test262
[27706|    0|    0| 1041] 100% ======================================>| 233.0s
PASS

I gave a try at your suggestion (a wrapper that relies on wpt run to launch a test262 in the browser). I pushed the changes to a remote branch at: https://github.com/dpino/web-platform-tests/tree/test262-runner

I have several questions regarding web-platform-tests. Ideally the way I think the test262 suite should be run is by opening a browser and run all the tests in the same instantiated browser. With the wrapper above, each launch of a test opens/closes a new browser, therefore taking a very long time to run the whole suite (even more since the suite should run on different agents). Another approach could be to group several 262 tests together into a single WPT. I don't know if it would be possible to have one single instance of a browser where every test is run and the browser communicates back the results to the command shell.

jgraham · 2017-11-28T22:29:45Z

web-platform-tests generally work with one instance of the browser running multiple tests.

The most obvious way to do this integration would be to generate testharness.js wrappers for the test262 tests and check in the generated files. These would then run like any other testharness.js test. It looks like that's more or less what's on your branch, but you don't add all the files at once, and call wpt run for every test rather than once.

There are more complex solutions we could imagine in which the templates are baked into the server like with .worker.js files. I don't know if that's worthwhile.

dpino · 2017-12-13T09:44:54Z

Thanks @jgraham for the clarification. Initially I thought web-platforms-test launched a new browser per test, but I was wrong.

I've updated the script quite a bit. Now I just use the script to generate the WPT wrappers from test262 test files and run them external as normal WPT tests.

OTOH, some of the tests were failing or timeout. The issue was that some 262 tests modify builtin objects such as Array and that had a collateral effect on the web-platform-test harnessing code. So I actually need to parse the source of the test and add code to undo the change once the test is over. Anyway, still struggling with this.

annevk · 2017-12-13T12:15:44Z

Perhaps an alternative approach is to load the 262 test in an <iframe> and then use onload to inspect the result? Might not be as nice though and come to think of it would not work in a worker and such. Seems those kind of tests would be rather hard to do properly with a harness.

dpino · 2017-12-13T13:22:04Z

@annevk I can give a try to run the test in an iframe, at least for same-origin window, and see if I got more tests passing. Right now launching test262/builtins directory, which is the largest 262 directory, I got 1000 tests failing and 35 timeouts. Maybe some tests fail due to a missing JS shell feature in the browser (not all of them are implemented yet) or so. I would need to look more into the failing tests.

The good thing of running the test in the browser as web-platform-tests is reusing all the infrastructure for running tests and retrieving reports. But everything that has to do with instrumentation (Sellenium/Marionette) is actually not useful for this case IMHO. @jugglinmike told me about https://github.com/bterlson/test262-harness that is a node.js tool for running test262 in the browser (there's also https://github.com/bakkot/test262-web-runner). So maybe a similar tool that uses a web-socket to communicate the results from the browser to a server process could be another approach. I don't know. Does it make sense? For the moment, I'm going to keep trying this approach.

annevk · 2017-12-13T13:30:44Z

I'm not sure, I'm not familiar enough with all the harnesses. I'm curious if @bakkot has looked into running test262 in a worker environment.

dpino · 2018-01-04T10:38:47Z

I have first version of the tests running. I reworked the script to run the tests inside an IFrame. Then I added support for other agents: child Window, DedicatedWorker and SharedWorker. ServiceWorker is not supported yet, more on that later.

I used the results of Test262-Web-Runner as a baseline to compare the results I got. I run the tests on Firefox Nightly 59.a1. First of all, here are the results for Test262-Web-Runner:

Test262-Web-Runner

Test	Ran	Failed
annexB	977/1003	26
built-ins	12743/13446 (skipped 32)	703 + 32
harness	94/94	0
intl402	231/236	5
language	13917/14822	905

And here are the results of the web-platform-tests's wrappers for test262 (only IFrame in this benchmark):

Test	Ran	Expected results	Failed
annexB	Ran 2263 tests (1003 parents, 1260 subtests)	2230	33 (FAIL: 33)
built-ins	Ran 40188 tests (13478 parents, 26710 subtests)	38748	1440 (FAIL: 1440)
harness	Ran 275 tests (94 parents, 181 subtests)	275	0
intl402	Ran 708 tests (236 parents, 472 subtests)	698	10 (FAIL: 10)
language	Ran 43243 tests (14898 parents, 28345 subtests)	41559	1684 (FAIL: 1684)

This summary cannot be compared directly with the results of Test262-Web-Runner. By default, 262's tests are executed both in strict mode and non-strict mode, unless a tag (onlyStrict, noStrict) indicates otherwise. For each test in the WPT wrapper, two actual tests are run normally. So when a test fails likely that counts as two failing tests. On the other hand, a test that fails in Test2-Web-Runner counts only once.

So to actually compare the WPT results and Test262-Web-Runner I need to normalize the results using an expression like the following:

$ grep "FAIL IFrame" annexB.output | cut -d : -f 2 | sort -u | wc -l

Here are the normalized results for IFrame:

Test	Ran	Failed
annexB	1003	26
built-ins	13446	720
harness	94	0
intl402	236	5
language	14822	827

The results are almost the same as Test262-Web-Runner (I just noticed the results for 'language' are much worse, although I used to get better results in other runs. I will look into that **) . Then I started to add support for the other agents. I paste the results for each type of agent:

** 08/01/2018: The values are updated now.

Window

Test	Ran	Failed
annexB	1003	26
built-ins	13446	720
harness	94	0
intl402	236	5
language	14822	833

Worker

Test	Ran	Failed
annexB	1003	69
built-ins	13446	1043
harness	94	1
intl402	236	6
language	14822	3827

SharedWorker

Test	Ran	Failed
annexB	1003	69
built-ins	13446	1059
harness	94	2
intl402	236	6
language	14822	3907

Regarding ServiceWorker, the reason I left it out for the moment is that for the currently supported agents I generate the tests on-the-fly (either an HTML page for IFrame and Window or a JavaScript file for DedicatedWorker and SharedWorker) using a Blob object. However, it's not possible to generate ServiceWorkers on-the-fly for security reasons. One possible work around would be to generate the ServiceWorker files for each test beforehand. The con is that that would duplicate the number of total files but I think it would work.

dpino · 2018-01-08T08:41:06Z

I fixed the issue that affected the results of the 'language' block test. The values are updated now.

dpino · 2018-01-10T16:46:33Z

I have pushed a PR with the script to generate the WPT wrappers as well as the harnessing code to run the tests. The PR is not ready to be merged yet, but I think it can be a starting point to get feedback and discuss what's pending to be done. PTAL #8980

annevk added the js label Nov 20, 2017

dpino mentioned this issue Jan 10, 2018

Script that creates web-platform-test wrappers for test262's tests #8980

Closed

foolip mentioned this issue Oct 20, 2019

Experiment: run Gecko's mochitests with some adapter code #19438

Closed

foolip mentioned this issue Aug 25, 2022

Requirements for scoring results from Test262 web-platform-tests/interop#117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing test262 in different (web platform) agents #8308

Testing test262 in different (web platform) agents #8308

annevk commented Nov 20, 2017

foolip commented Nov 20, 2017

annevk commented Nov 20, 2017

foolip commented Nov 20, 2017

annevk commented Nov 20, 2017

annevk commented Nov 20, 2017

foolip commented Nov 20, 2017

dpino commented Nov 20, 2017

jgraham commented Nov 20, 2017

dpino commented Nov 28, 2017

jgraham commented Nov 28, 2017

dpino commented Dec 13, 2017

annevk commented Dec 13, 2017

dpino commented Dec 13, 2017

annevk commented Dec 13, 2017

dpino commented Jan 4, 2018 •

edited

Loading

dpino commented Jan 8, 2018

dpino commented Jan 10, 2018

Testing test262 in different (web platform) agents #8308

Testing test262 in different (web platform) agents #8308

Comments

annevk commented Nov 20, 2017

foolip commented Nov 20, 2017

annevk commented Nov 20, 2017

foolip commented Nov 20, 2017

annevk commented Nov 20, 2017

annevk commented Nov 20, 2017

foolip commented Nov 20, 2017

dpino commented Nov 20, 2017

jgraham commented Nov 20, 2017

dpino commented Nov 28, 2017

jgraham commented Nov 28, 2017

dpino commented Dec 13, 2017

annevk commented Dec 13, 2017

dpino commented Dec 13, 2017

annevk commented Dec 13, 2017

dpino commented Jan 4, 2018 • edited Loading

dpino commented Jan 8, 2018

dpino commented Jan 10, 2018

dpino commented Jan 4, 2018 •

edited

Loading