wptrunner: Always restart on test error #10186

jugglinmike · 2018-03-27T00:56:32Z

Because a harness error may describe an unrecoverable browser state, the
browser should always be restarted when such an error is encountered
(even in the presence of the --no-restart-on-unexpected flag).

@jgraham @gsnedders If wpt run --no-restart-on-unexpected is used to execute
a test file like this:

<script>
window.opener.close();
</script>

Then after that test fails with the status "ERROR", all subsequent tests fail
for the same reason. By interpreting the error as unrecoverable, we can avoid
invalidating test runs in these cases.

That's a contrived example, though. As an alternative, we might be able to
address it by detecting a missing window and applying the status "CRASH". I
started researching this possibility, but it started to look like this logic
would be dependent on the browser, executor, and/or transport. I'm also
thinking that there are many more subtle ways the environment could become
corrupted, and we'd probably never be able to detect them all.

In contrast, what I'm proposing here is much more heavy-handed. There are many
kinds of harness errors that don't require full restarts, so this patch will
make running WPT with --no-restart-on-unexpected slower. My inclination is to
prefer correctness and completeness over speed since test execution time
(unlike correctness) can be improved by consumers by using more machines. I
also get the sense that not many consumers have a use for this particular flag,
so I'm guessing that we wouldn't want to take on the complexity overhead of a
more nuanced solution.

What do you think?

This change is

Because a harness error may describe an unrecoverable browser state, the browser should always be restarted when such an error is encountered (even in the presence of the `--no-restart-on-unexpected` flag).

gsnedders · 2018-03-27T00:58:53Z

Am in favour.

ghost · 2018-03-27T01:11:38Z

Build PASSED

Started: 2018-03-27 00:57:17
Finished: 2018-03-27 01:11:38

View more information about this build on:

jgraham · 2018-03-27T09:27:12Z

I suspect this will have performance implications. The typical cause of an ERROR is "there was an unexpected exception in the test" which is often a signal for "the test if for a feature the browser doesn't support"; in that case we shouldn't be in an unrecoverable state.

I think if you want to do this, you should make a HARNESS-ERROR state analogous to EXTERNAL-TIMEOUT which records a more serious kind of error for which a restart is requirred.

Incidentially the example where the harness window gets closed is supposed to be handled. Does that not work correctly? Also if you know of specific problematic tests it's possible to set restart-after: true in the expectation metadata for that test to force a restart irrespective of the result.

jgraham

(see other comment)

jugglinmike · 2018-03-27T16:34:57Z

I think if you want to do this, you should make a HARNESS-ERROR state
analogous to EXTERNAL-TIMEOUT which records a more serious kind of error for
which a restart is requirred.

Until now, I had considered "ERROR" to be reserved for cases where the test
interfered with the operation of the harness itself. This "not even wrong"
status seemed meaningfully distinct from "FAIL", where the test's expectation
was not satisfied. So to me, "HARNESS-ERROR" would be the same as "ERROR", and
it's not clear what "ERROR" would really mean in that case.

If "ERROR" is an acceptable status for failing tests, how does it differ from
"FAIL"?

Incidentially the example where the harness window gets closed is supposed to
be handled. Does that not work correctly?

It does not. In case it helps:

$ firefox --version
Mozilla Firefox 59.0.1
$ _venv/bin/geckodriver --version
geckodriver 0.20.0

The source code of this program is available from
testing/geckodriver in https://hg.mozilla.org/mozilla-central.

This program is subject to the terms of the Mozilla Public License 2.0.
You can obtain a copy of the license at https://mozilla.org/MPL/2.0/.

Also if you know of specific problematic tests it's possible to set
restart-after: true in the expectation metadata for that test to force a
restart irrespective of the result.

In the WPT Dashboard project, we do not control the test
source or the implementation source. That makes the prospect of maintaining
expectations files undesirable. Thanks for offering an alternative, though!

jgraham · 2018-03-27T16:41:10Z

A single test failing means that an assert in that test failed. ERROR means that an exception was thrown. In particular if the exception is thrown outside a test, the entire file is marked as ERROR.

In the WPT Dashboard project, we do not control the test source or the implementation source.

That situation is not very different from the one that a vendor is in. I'm not suggesting you put expectations in your epectation metadata (but you could!), but that you use the other features e.g. the ability to disable specific problematic tests or to force a restart after known problematic tests. Having control of the implementation doesn't affect the process around using these things.

jugglinmike · 2018-03-27T17:20:44Z

In particular if the exception is thrown outside a test, the entire file is
marked as ERROR.

Speaking theoretically, would you agree that it would be better if tests did
not signal failure with uncaught errors? That's my supposition, but I'd be
happy to learn how it's incorrect.

Having control of the implementation doesn't affect the process around using
these things.

You're right, of course. We can document the rationale for a given expectation,
and it won't hurt to have stale entries in that file.

jgraham · 2018-03-27T17:58:37Z

Speaking theoretically, would you agree that it would be better if tests did not signal failure with uncaught errors?

Yep, much of the design of testharness.js was about avoiding this (or at least isolating different tests so that an error in one doesn't necessarily stop the entire run). But it's impossible to prevent entirely.

foolip · 2018-04-11T21:34:37Z

So, I'm trying to work out what the core of the problem is here. Do harness errors and some other bad things all result in "ERROR" as the test-level status, or are we talking about the harness status being "ERROR" instead of "OK"? Can reftests be "ERROR"? Is this stuff documented anywhere? :)

I actually thought that "always restart on test error" was already the default, but clearly that's not the case. For most existing harness errors probably are no more suggestive of needed restarted than a failed test. Is it possible to separate out the "test window gone" case and treat that as a more serious kind of error and restart for just that?

Alternatively, how many harness errors are there, can we fix 90% of them, and does it then make sense to restart on "ERROR" always?

jugglinmike · 2018-04-11T21:54:10Z

I actually thought that "always restart on test error" was already the
default, but clearly that's not the case.

The WPTDashboard project enabled the WPT CLI's --no-restart-on-unexpected
flag in August in order to reduce the time required to collect results from
Safari on Sauce Labs. See web-platform-tests/results-collection#62

Is it possible to separate out the "test window gone" case and treat that as
a more serious kind of error and restart for just that?

To quote James above:

Incidentially the example where the harness window gets closed is supposed to
be handled.

So the problem we're experiencing with Edge in Safari is definitely a violation
of the expected behavior.

(By the way, James: I can no longer reproduce the behavior using Firefox, so
whatever issue I was facing seems to have been resolved. We're still
experiencing the issue with Edge on Sauce Labs that motivated this pull
request, but I don't yet have a way to reproduce that locally.)

jgraham · 2018-04-12T14:36:43Z

I created #10444 as a better alternative here. There may be better ways to deal with the general problem of e.g. the window being closed unexpectedly, but I think that approach will improve the reliability without too much additional overhead in the case where the errors are at the test level.

jgraham · 2018-04-17T10:08:20Z

Closing this on the basis that the INTERNAL-ERROR changes will fix things.

wptrunner: Always restart on test error

6dafeba

Because a harness error may describe an unrecoverable browser state, the browser should always be restarted when such an error is encountered (even in the presence of the `--no-restart-on-unexpected` flag).

jugglinmike added the infra label Mar 27, 2018

jugglinmike requested review from gsnedders and jgraham March 27, 2018 00:56

gsnedders approved these changes Mar 27, 2018

View reviewed changes

jgraham requested changes Mar 27, 2018

View reviewed changes

This was referenced Apr 3, 2018

Defer results validation web-platform-tests/results-collection#531

Merged

Research: Why do non-Sauce Labs browsers report fewer than 100% of results? web-platform-tests/results-collection#478

Open

foolip mentioned this pull request Apr 11, 2018

Enable ConsumeGestureOnNavigation by default #10200

Merged

jgraham closed this Apr 17, 2018

jugglinmike mentioned this pull request Apr 17, 2018

Reject incomplete result sets web-platform-tests/results-collection#466

Open

jugglinmike mentioned this pull request Oct 19, 2018

Widespread timeouts in Edge 17 web-platform-tests/results-collection#563

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wptrunner: Always restart on test error #10186

wptrunner: Always restart on test error #10186

jugglinmike commented Mar 27, 2018 •

edited by wpt-reviewable-bot

Loading

gsnedders commented Mar 27, 2018

ghost commented Mar 27, 2018

jgraham commented Mar 27, 2018

jgraham left a comment

jugglinmike commented Mar 27, 2018

jgraham commented Mar 27, 2018

jugglinmike commented Mar 27, 2018

jgraham commented Mar 27, 2018

foolip commented Apr 11, 2018

jugglinmike commented Apr 11, 2018

jgraham commented Apr 12, 2018

jgraham commented Apr 17, 2018

wptrunner: Always restart on test error #10186

wptrunner: Always restart on test error #10186

Conversation

jugglinmike commented Mar 27, 2018 • edited by wpt-reviewable-bot Loading

gsnedders commented Mar 27, 2018

ghost commented Mar 27, 2018

Build PASSED

jgraham commented Mar 27, 2018

jgraham left a comment

Choose a reason for hiding this comment

jugglinmike commented Mar 27, 2018

jgraham commented Mar 27, 2018

jugglinmike commented Mar 27, 2018

jgraham commented Mar 27, 2018

foolip commented Apr 11, 2018

jugglinmike commented Apr 11, 2018

jgraham commented Apr 12, 2018

jgraham commented Apr 17, 2018

jugglinmike commented Mar 27, 2018 •

edited by wpt-reviewable-bot

Loading