Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wptrunner] Add option to skip subtest checks for incomplete tests #44134

Conversation

jonathan-j-lee
Copy link
Contributor

Some testharness/wdspec tests can racily time out on different subtests between runs, which makes writing stable subtest TIMEOUT and NOTRUN expectations that won't flake difficult.

--no-check-incomplete-subtests discards subtest results for tests that time out or crash, meaning only the harness-level statuses contribute to determining whether those tests ran expectedly. Subtests are always checked if the harness is OK/ERROR, since their subtest results are generally more stable.

See also: https://crrev.com/c/5112639

Some testharness/wdspec tests can racily time out on different subtests
between runs, which makes writing stable subtest `TIMEOUT` and `NOTRUN`
expectations that won't flake difficult.

`--no-check-incomplete-subtests` discards subtest results for tests that
time out or crash, meaning only the harness-level statuses contribute to
determining whether those tests ran expectedly. Subtests are always
checked if the harness is `OK`/`ERROR`, since their subtest results are
generally more stable.

See also: https://crrev.com/c/5112639
@jonathan-j-lee jonathan-j-lee marked this pull request as ready for review January 23, 2024 02:45
@wpt-pr-bot wpt-pr-bot added infra wptrunner The automated test runner, commonly called through ./wpt run labels Jan 23, 2024
@foolip
Copy link
Member

foolip commented Jan 23, 2024

@jonathan-j-lee is the important part here that we don't fail (non-zero exit code) in these cases, or that we don't include the results in the report? If the report is the important part, I'm wondering if it would be an option to put the logic in the code that interprets the report? I suspect not, but want to make sure.

@WeizhongX
Copy link
Contributor

I had been thinking changing the way unexpected_fail_tests is calculated[1] for this scenario. The bad (or good) side of that is that would affect every vendor, so would need to get an approval from them too. The general idea is that if the overall result is expected Timeout or Crash, some mismatch at subtest level should not matter. @jgraham, can you pls share your thoughts here?

The change in this PR is trying to discard all subtest results when needed. The question would be will the results be useful sometimes? We might also need to find a better name for check_incomplete_subtests.

[1] https://github.com/web-platform-tests/wpt/blob/master/tools/wptrunner/wptrunner/testrunner.py#L765

@jgraham
Copy link
Contributor

jgraham commented Jan 26, 2024

We have also seen this problem, but I'm not a big fan of dealing with it this way. It means that if you have a test with, say, 5 subtests that consistently pass and then two subtests which can run in a random order but where both timeout (resulting in an overall test timeout, but a combination of TIMEOUT/NOTRUN results for the two subtests), you end up throwing away the results of the 5 stable tests. Similarly if you have a test with a large number of subtests and you have, say 900 stable results but the overall test times out somewhere between tests 900 and 1000.

I definitely wouldn't be in favour of making this behaviour into the default / only behaviour for tests that timeout or crash. The ideal situation is of course that we identify and fix the problematic tests.

I can imagine a different rules that would be more specific in dealing with this situation e.g. something if the overall test times out and the subtest has a status of TIMEOUT or NOTRUN, then we consider that a known intermittent by adding the actual status to the expected statuses. But I'm not sure that works well and I don't love adding that kind of special case that makes the overall logic of the system harder to reason about and understand.

@WeizhongX
Copy link
Contributor

jgraham@, thanks for the feedback.

@WeizhongX
Copy link
Contributor

I think maybe we should accept what jgraham@ suggested, i.e. when test harness tests expectedly Timeout, the baseline for that test harness test still matters, and a mismatch in baseline would cause the test to fail. This would diverge from the behavior in RWT and we should document that.

This might also require a change to the rebaseline tool.

@jonathan-j-lee
Copy link
Contributor Author

I think maybe we should accept what jgraham@ suggested

Ok, sounds good. I tried that in https://crrev.com/c/5112639/15, but it seemed to require many (70+) baselines.

@jonathan-j-lee jonathan-j-lee deleted the wptrunner/no-check-incomplete-subtests branch January 30, 2024 18:17
@WeizhongX
Copy link
Contributor

WeizhongX commented Jan 31, 2024

I later realized there might still be some issues. When a chromium gardener see a test harness test times out on some CI builders, and s/he decided to add a Timeout expectation for the test. This would not make the test to run as expected Timeout, instead it turns the Timeout to a Failure, and the gardener or another one would need to add a Failure test expectation for the test again.

I can imagine a different rules that would be more specific in dealing with this situation e.g. something if the overall test times out and the subtest has a status of TIMEOUT or NOTRUN, then we consider that a known intermittent by adding the actual status to the expected statuses.

jgraham@, do you think if we could implement the above as you suggested? We could add lots of comments to explain the code. And I think we only need to handle the case when the overall result is Timeout or Crash.

I am not in favor of another command lin argument for this. We already get too many command line arguments.

@WeizhongX
Copy link
Contributor

jgraham@, looks like the approach you mentioned is something in the middle: we will still catch a change from PASS => FAIL, but won't for a PASS => TIMEOUT. But for the simplicity of the system, might be worth the tradeoff.

@WeizhongX
Copy link
Contributor

I think we should move the code that determines if a test run as expected to a separate function. We could then override that function in Chromium. In this way we do not need to change upstream behaviors.

@jonathan-j-lee
Copy link
Contributor Author

Currently, Chromium's wptrunner wrapper has expectation logic that augments:

# Any subtest can time out
[a.html]
  expected: TIMEOUT
  [b]
    expected: PASS
  [c]
    expected: TIMEOUT
  [d]
    expected: NOTRUN

to:

[a.html]
  expected: TIMEOUT
  [b]
    expected: [PASS, TIMEOUT, NOTRUN]
  [c]
    expected: [TIMEOUT, NOTRUN]
  [d]
    expected: [TIMEOUT, NOTRUN]

This should be equivalent to what @jgraham proposed, except the TIMEOUT and NOTRUNs are being added up-front?

I think the main question we need to answer is whether we care about unexpected PASS/FAIL during a timeout (e.g., b fails or c passes in the example above). From https://chromium-review.googlesource.com/c/chromium/src/+/5112639/comment/657f0e3d_1329be25/, it seems our answer was "no" because that's the behavior of Chromium's run_web_tests.py harness. run_web_tests.py checks subtest results if and only if the test ran to completion (i.e., at runtime), which is not possible to emulate in wptrunner with static expectations alone.

It might make sense to accept a difference in behavior where, for wptrunner-run tests, failures preceding a timeout need to be suppressed separately from adding the timeout expectation. Since most tests pass, this hopefully shouldn't be too onerous for the build gardeners, and requiring explicit failure expectations seem better anyway for tracking by test owners.

@WeizhongX
Copy link
Contributor

WeizhongX commented Jan 31, 2024

This should be equivalent to what @jgraham proposed

Agreed

From https://chromium-review.googlesource.com/c/chromium/src/+/5112639/comment/657f0e3d_1329be25/, it seems our answer was "no" because that's the behavior of Chromium's run_web_tests.py harness.

I think I changed my position here due to what jgraham@ said above, and that is why I said we need to document that.

I think we should move the code that determines if a test run as expected to a separate function.

Do you think if we can do this? If we can do this, we can stick with RWT's behavior.

@jonathan-j-lee
Copy link
Contributor Author

Do you think if we can do this?

I think so (with some refactoring). The main obstacle is that testrunner reads expectations directly from wpttest.Test, an object that's shared between reruns (so it shouldn't carry any per-run state):

expected = test.expected(result.name)
known_intermittent = test.known_intermittent(result.name)

Instead, you'd want to plumb the expectations through wpttest.Test -> wpttest.(Subtest)Result -> testrunner (which AFAICT aren't actually used right now):

self.expected = expected
self.known_intermittent = known_intermittent if known_intermittent is not None else []

... then give the monkey-patched convert_result() hook a chance to dynamically update the expectations (similar to this PR).

jonathan-j-lee added a commit to jonathan-j-lee/wpt that referenced this pull request Feb 6, 2024
This allows vendors to override expectations at runtime via
browser-specific result converters. Callers that want to simply pass
through the testloader-read expectations should use the newly introduced
`Test.make_{subtest_,}result()` to construct `(Subtest)Result`s instead
of invoking the constructor directly (each pair has the same signature).

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests#44134 (comment)
jonathan-j-lee added a commit to jonathan-j-lee/wpt that referenced this pull request Feb 6, 2024
This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests#44134 (comment)
jonathan-j-lee added a commit that referenced this pull request Feb 6, 2024
…44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: #44134 (comment)
aarongable pushed a commit to chromium/chromium that referenced this pull request Feb 8, 2024
This matches the `run_web_tests.py` behavior. `run_wpt_tests.py`
previously tried to emulate this in the expectation translation layer,
but handling subtest results on test timeout is fundamentally a decision
that can only be made at runtime. Hook into the testrunner machinery to
do so.

See also: web-platform-tests/wpt#44134

Bug: 40943761
Cq-Include-Trybots: luci.chromium.try:linux-wpt-chromium-rel
Change-Id: I70e5e02c20dab56cbdaa7c376242b32b8806e17a
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5251248
Reviewed-by: Weizhong Xia <[email protected]>
Commit-Queue: Jonathan Lee <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1257694}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Feb 13, 2024
…{Test -> (Subtest)Result}`, a=testonly

Automatic update from web-platform-tests
[wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests/wpt#44134 (comment)
--

wpt-commits: a68f313a50899795e1e6660b7398544d340f1652
wpt-pr: 44424
jamienicol pushed a commit to jamienicol/gecko that referenced this pull request Feb 15, 2024
…{Test -> (Subtest)Result}`, a=testonly

Automatic update from web-platform-tests
[wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests/wpt#44134 (comment)
--

wpt-commits: a68f313a50899795e1e6660b7398544d340f1652
wpt-pr: 44424
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Feb 16, 2024
…{Test -> (Subtest)Result}`, a=testonly

Automatic update from web-platform-tests
[wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests/wpt#44134 (comment)
--

wpt-commits: a68f313a50899795e1e6660b7398544d340f1652
wpt-pr: 44424

UltraBlame original commit: b53534ad1edfde1911bd2d17ce4526db588f2942
mbrodesser-Igalia pushed a commit to mbrodesser-Igalia/wpt that referenced this pull request Feb 19, 2024
…eb-platform-tests#44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests#44134 (comment)
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Feb 20, 2024
…{Test -> (Subtest)Result}`, a=testonly

Automatic update from web-platform-tests
[wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: web-platform-tests/wpt#44134 (comment)
--

wpt-commits: a68f313a50899795e1e6660b7398544d340f1652
wpt-pr: 44424

UltraBlame original commit: b53534ad1edfde1911bd2d17ce4526db588f2942
marcoscaceres pushed a commit that referenced this pull request Feb 23, 2024
…44424)

This allows vendors to override expectations at runtime via
browser-specific result converters. Instead of constructing
`(Subtest)Result`s directly, executors should use the newly introduced
`Test.make_{subtest_,}result()`, which have the same signatures but
default to passing through the testloader-read expectations.

This is a pure refactor; no functional changes intended.

Successor to: #44134 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra wptrunner The automated test runner, commonly called through ./wpt run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants