Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wptrunner: Always restart on test error #10186

Closed

Conversation

jugglinmike
Copy link
Contributor

@jugglinmike jugglinmike commented Mar 27, 2018

Because a harness error may describe an unrecoverable browser state, the
browser should always be restarted when such an error is encountered
(even in the presence of the --no-restart-on-unexpected flag).


@jgraham @gsnedders If wpt run --no-restart-on-unexpected is used to execute
a test file like this:

<script>
window.opener.close();
</script>

Then after that test fails with the status "ERROR", all subsequent tests fail
for the same reason. By interpreting the error as unrecoverable, we can avoid
invalidating test runs in these cases.

That's a contrived example, though. As an alternative, we might be able to
address it by detecting a missing window and applying the status "CRASH". I
started researching this possibility, but it started to look like this logic
would be dependent on the browser, executor, and/or transport. I'm also
thinking that there are many more subtle ways the environment could become
corrupted, and we'd probably never be able to detect them all.

In contrast, what I'm proposing here is much more heavy-handed. There are many
kinds of harness errors that don't require full restarts, so this patch will
make running WPT with --no-restart-on-unexpected slower. My inclination is to
prefer correctness and completeness over speed since test execution time
(unlike correctness) can be improved by consumers by using more machines. I
also get the sense that not many consumers have a use for this particular flag,
so I'm guessing that we wouldn't want to take on the complexity overhead of a
more nuanced solution.

What do you think?


This change is Reviewable

Because a harness error may describe an unrecoverable browser state, the
browser should always be restarted when such an error is encountered
(even in the presence of the `--no-restart-on-unexpected` flag).
@gsnedders
Copy link
Member

Am in favour.

@ghost
Copy link

ghost commented Mar 27, 2018

Build PASSED

Started: 2018-03-27 00:57:17
Finished: 2018-03-27 01:11:38

View more information about this build on:

@jgraham
Copy link
Contributor

jgraham commented Mar 27, 2018

I suspect this will have performance implications. The typical cause of an ERROR is "there was an unexpected exception in the test" which is often a signal for "the test if for a feature the browser doesn't support"; in that case we shouldn't be in an unrecoverable state.

I think if you want to do this, you should make a HARNESS-ERROR state analogous to EXTERNAL-TIMEOUT which records a more serious kind of error for which a restart is requirred.

Incidentially the example where the harness window gets closed is supposed to be handled. Does that not work correctly? Also if you know of specific problematic tests it's possible to set restart-after: true in the expectation metadata for that test to force a restart irrespective of the result.

Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see other comment)

@jugglinmike
Copy link
Contributor Author

I think if you want to do this, you should make a HARNESS-ERROR state
analogous to EXTERNAL-TIMEOUT which records a more serious kind of error for
which a restart is requirred.

Until now, I had considered "ERROR" to be reserved for cases where the test
interfered with the operation of the harness itself. This "not even wrong"
status seemed meaningfully distinct from "FAIL", where the test's expectation
was not satisfied. So to me, "HARNESS-ERROR" would be the same as "ERROR", and
it's not clear what "ERROR" would really mean in that case.

If "ERROR" is an acceptable status for failing tests, how does it differ from
"FAIL"?

Incidentially the example where the harness window gets closed is supposed to
be handled. Does that not work correctly?

It does not. In case it helps:

$ firefox --version
Mozilla Firefox 59.0.1
$ _venv/bin/geckodriver --version
geckodriver 0.20.0

The source code of this program is available from
testing/geckodriver in https://hg.mozilla.org/mozilla-central.

This program is subject to the terms of the Mozilla Public License 2.0.
You can obtain a copy of the license at https://mozilla.org/MPL/2.0/.

Also if you know of specific problematic tests it's possible to set
restart-after: true in the expectation metadata for that test to force a
restart irrespective of the result.

In the WPT Dashboard project, we do not control the test
source or the implementation source. That makes the prospect of maintaining
expectations files undesirable. Thanks for offering an alternative, though!

@jgraham
Copy link
Contributor

jgraham commented Mar 27, 2018

A single test failing means that an assert in that test failed. ERROR means that an exception was thrown. In particular if the exception is thrown outside a test, the entire file is marked as ERROR.

In the WPT Dashboard project, we do not control the test source or the implementation source.

That situation is not very different from the one that a vendor is in. I'm not suggesting you put expectations in your epectation metadata (but you could!), but that you use the other features e.g. the ability to disable specific problematic tests or to force a restart after known problematic tests. Having control of the implementation doesn't affect the process around using these things.

@jugglinmike
Copy link
Contributor Author

In particular if the exception is thrown outside a test, the entire file is
marked as ERROR.

Speaking theoretically, would you agree that it would be better if tests did
not signal failure with uncaught errors? That's my supposition, but I'd be
happy to learn how it's incorrect.

Having control of the implementation doesn't affect the process around using
these things.

You're right, of course. We can document the rationale for a given expectation,
and it won't hurt to have stale entries in that file.

@jgraham
Copy link
Contributor

jgraham commented Mar 27, 2018

Speaking theoretically, would you agree that it would be better if tests did not signal failure with uncaught errors?

Yep, much of the design of testharness.js was about avoiding this (or at least isolating different tests so that an error in one doesn't necessarily stop the entire run). But it's impossible to prevent entirely.

@foolip
Copy link
Member

foolip commented Apr 11, 2018

So, I'm trying to work out what the core of the problem is here. Do harness errors and some other bad things all result in "ERROR" as the test-level status, or are we talking about the harness status being "ERROR" instead of "OK"? Can reftests be "ERROR"? Is this stuff documented anywhere? :)

I actually thought that "always restart on test error" was already the default, but clearly that's not the case. For most existing harness errors probably are no more suggestive of needed restarted than a failed test. Is it possible to separate out the "test window gone" case and treat that as a more serious kind of error and restart for just that?

Alternatively, how many harness errors are there, can we fix 90% of them, and does it then make sense to restart on "ERROR" always?

@jugglinmike
Copy link
Contributor Author

I actually thought that "always restart on test error" was already the
default, but clearly that's not the case.

The WPTDashboard project enabled the WPT CLI's --no-restart-on-unexpected
flag in August in order to reduce the time required to collect results from
Safari on Sauce Labs. See web-platform-tests/results-collection#62

Is it possible to separate out the "test window gone" case and treat that as
a more serious kind of error and restart for just that?

To quote James above:

Incidentially the example where the harness window gets closed is supposed to
be handled.

So the problem we're experiencing with Edge in Safari is definitely a violation
of the expected behavior.

(By the way, James: I can no longer reproduce the behavior using Firefox, so
whatever issue I was facing seems to have been resolved. We're still
experiencing the issue with Edge on Sauce Labs that motivated this pull
request, but I don't yet have a way to reproduce that locally.)

@jgraham
Copy link
Contributor

jgraham commented Apr 12, 2018

I created #10444 as a better alternative here. There may be better ways to deal with the general problem of e.g. the window being closed unexpectedly, but I think that approach will improve the reliability without too much additional overhead in the case where the errors are at the test level.

@jgraham
Copy link
Contributor

jgraham commented Apr 17, 2018

Closing this on the basis that the INTERNAL-ERROR changes will fix things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants