[wptrunner] Add option to skip subtest checks for incomplete tests #44134

jonathan-j-lee · 2024-01-23T02:01:36Z

Some testharness/wdspec tests can racily time out on different subtests between runs, which makes writing stable subtest TIMEOUT and NOTRUN expectations that won't flake difficult.

--no-check-incomplete-subtests discards subtest results for tests that time out or crash, meaning only the harness-level statuses contribute to determining whether those tests ran expectedly. Subtests are always checked if the harness is OK/ERROR, since their subtest results are generally more stable.

See also: https://crrev.com/c/5112639

Some testharness/wdspec tests can racily time out on different subtests between runs, which makes writing stable subtest `TIMEOUT` and `NOTRUN` expectations that won't flake difficult. `--no-check-incomplete-subtests` discards subtest results for tests that time out or crash, meaning only the harness-level statuses contribute to determining whether those tests ran expectedly. Subtests are always checked if the harness is `OK`/`ERROR`, since their subtest results are generally more stable. See also: https://crrev.com/c/5112639

foolip · 2024-01-23T14:56:30Z

@jonathan-j-lee is the important part here that we don't fail (non-zero exit code) in these cases, or that we don't include the results in the report? If the report is the important part, I'm wondering if it would be an option to put the logic in the code that interprets the report? I suspect not, but want to make sure.

WeizhongX · 2024-01-23T17:48:13Z

I had been thinking changing the way unexpected_fail_tests is calculated[1] for this scenario. The bad (or good) side of that is that would affect every vendor, so would need to get an approval from them too. The general idea is that if the overall result is expected Timeout or Crash, some mismatch at subtest level should not matter. @jgraham, can you pls share your thoughts here?

The change in this PR is trying to discard all subtest results when needed. The question would be will the results be useful sometimes? We might also need to find a better name for check_incomplete_subtests.

[1] https://github.com/web-platform-tests/wpt/blob/master/tools/wptrunner/wptrunner/testrunner.py#L765

jgraham · 2024-01-26T12:27:47Z

We have also seen this problem, but I'm not a big fan of dealing with it this way. It means that if you have a test with, say, 5 subtests that consistently pass and then two subtests which can run in a random order but where both timeout (resulting in an overall test timeout, but a combination of TIMEOUT/NOTRUN results for the two subtests), you end up throwing away the results of the 5 stable tests. Similarly if you have a test with a large number of subtests and you have, say 900 stable results but the overall test times out somewhere between tests 900 and 1000.

I definitely wouldn't be in favour of making this behaviour into the default / only behaviour for tests that timeout or crash. The ideal situation is of course that we identify and fix the problematic tests.

I can imagine a different rules that would be more specific in dealing with this situation e.g. something if the overall test times out and the subtest has a status of TIMEOUT or NOTRUN, then we consider that a known intermittent by adding the actual status to the expected statuses. But I'm not sure that works well and I don't love adding that kind of special case that makes the overall logic of the system harder to reason about and understand.

WeizhongX · 2024-01-26T16:08:01Z

jgraham@, thanks for the feedback.

WeizhongX · 2024-01-30T01:11:20Z

I think maybe we should accept what jgraham@ suggested, i.e. when test harness tests expectedly Timeout, the baseline for that test harness test still matters, and a mismatch in baseline would cause the test to fail. This would diverge from the behavior in RWT and we should document that.

This might also require a change to the rebaseline tool.

jonathan-j-lee · 2024-01-30T18:17:04Z

I think maybe we should accept what jgraham@ suggested

Ok, sounds good. I tried that in https://crrev.com/c/5112639/15, but it seemed to require many (70+) baselines.

WeizhongX · 2024-01-31T06:05:45Z

I later realized there might still be some issues. When a chromium gardener see a test harness test times out on some CI builders, and s/he decided to add a Timeout expectation for the test. This would not make the test to run as expected Timeout, instead it turns the Timeout to a Failure, and the gardener or another one would need to add a Failure test expectation for the test again.

I can imagine a different rules that would be more specific in dealing with this situation e.g. something if the overall test times out and the subtest has a status of TIMEOUT or NOTRUN, then we consider that a known intermittent by adding the actual status to the expected statuses.

jgraham@, do you think if we could implement the above as you suggested? We could add lots of comments to explain the code. And I think we only need to handle the case when the overall result is Timeout or Crash.

I am not in favor of another command lin argument for this. We already get too many command line arguments.

WeizhongX · 2024-01-31T17:45:58Z

jgraham@, looks like the approach you mentioned is something in the middle: we will still catch a change from PASS => FAIL, but won't for a PASS => TIMEOUT. But for the simplicity of the system, might be worth the tradeoff.

WeizhongX · 2024-01-31T18:46:19Z

I think we should move the code that determines if a test run as expected to a separate function. We could then override that function in Chromium. In this way we do not need to change upstream behaviors.

jonathan-j-lee · 2024-01-31T19:52:46Z

Currently, Chromium's wptrunner wrapper has expectation logic that augments:

# Any subtest can time out
[a.html]
  expected: TIMEOUT
  [b]
    expected: PASS
  [c]
    expected: TIMEOUT
  [d]
    expected: NOTRUN

to:

[a.html]
  expected: TIMEOUT
  [b]
    expected: [PASS, TIMEOUT, NOTRUN]
  [c]
    expected: [TIMEOUT, NOTRUN]
  [d]
    expected: [TIMEOUT, NOTRUN]

This should be equivalent to what @jgraham proposed, except the TIMEOUT and NOTRUNs are being added up-front?

I think the main question we need to answer is whether we care about unexpected PASS/FAIL during a timeout (e.g., b fails or c passes in the example above). From https://chromium-review.googlesource.com/c/chromium/src/+/5112639/comment/657f0e3d_1329be25/, it seems our answer was "no" because that's the behavior of Chromium's run_web_tests.py harness. run_web_tests.py checks subtest results if and only if the test ran to completion (i.e., at runtime), which is not possible to emulate in wptrunner with static expectations alone.

It might make sense to accept a difference in behavior where, for wptrunner-run tests, failures preceding a timeout need to be suppressed separately from adding the timeout expectation. Since most tests pass, this hopefully shouldn't be too onerous for the build gardeners, and requiring explicit failure expectations seem better anyway for tracking by test owners.

WeizhongX · 2024-01-31T20:12:34Z

This should be equivalent to what @jgraham proposed

Agreed

From https://chromium-review.googlesource.com/c/chromium/src/+/5112639/comment/657f0e3d_1329be25/, it seems our answer was "no" because that's the behavior of Chromium's run_web_tests.py harness.

I think I changed my position here due to what jgraham@ said above, and that is why I said we need to document that.

I think we should move the code that determines if a test run as expected to a separate function.

Do you think if we can do this? If we can do this, we can stick with RWT's behavior.

jonathan-j-lee · 2024-01-31T22:05:53Z

Do you think if we can do this?

I think so (with some refactoring). The main obstacle is that testrunner reads expectations directly from wpttest.Test, an object that's shared between reruns (so it shouldn't carry any per-run state):

wpt/tools/wptrunner/wptrunner/testrunner.py

Lines 694 to 695 in 231f825

    
           expected = test.expected(result.name) 
        
           known_intermittent = test.known_intermittent(result.name)

Instead, you'd want to plumb the expectations through wpttest.Test -> wpttest.(Subtest)Result -> testrunner (which AFAICT aren't actually used right now):

wpt/tools/wptrunner/wptrunner/wpttest.py

Lines 28 to 29 in 231f825

    
           self.expected = expected 
        
           self.known_intermittent = known_intermittent if known_intermittent is not None else []

... then give the monkey-patched convert_result() hook a chance to dynamically update the expectations (similar to this PR).

This allows vendors to override expectations at runtime via browser-specific result converters. Callers that want to simply pass through the testloader-read expectations should use the newly introduced `Test.make_{subtest_,}result()` to construct `(Subtest)Result`s instead of invoking the constructor directly (each pair has the same signature). This is a pure refactor; no functional changes intended. Successor to: web-platform-tests#44134 (comment)

This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests#44134 (comment)

…44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: #44134 (comment)

This matches the `run_web_tests.py` behavior. `run_wpt_tests.py` previously tried to emulate this in the expectation translation layer, but handling subtest results on test timeout is fundamentally a decision that can only be made at runtime. Hook into the testrunner machinery to do so. See also: web-platform-tests/wpt#44134 Bug: 40943761 Cq-Include-Trybots: luci.chromium.try:linux-wpt-chromium-rel Change-Id: I70e5e02c20dab56cbdaa7c376242b32b8806e17a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5251248 Reviewed-by: Weizhong Xia <[email protected]> Commit-Queue: Jonathan Lee <[email protected]> Cr-Commit-Position: refs/heads/main@{#1257694}

…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424

…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424 UltraBlame original commit: b53534ad1edfde1911bd2d17ce4526db588f2942

…eb-platform-tests#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests#44134 (comment)

…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424 UltraBlame original commit: b53534ad1edfde1911bd2d17ce4526db588f2942

…44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: #44134 (comment)

jonathan-j-lee marked this pull request as ready for review January 23, 2024 02:45

wpt-pr-bot added infra wptrunner The automated test runner, commonly called through ./wpt run labels Jan 23, 2024

wpt-pr-bot assigned foolip Jan 23, 2024

wpt-pr-bot requested review from DanielRyanSmith, foolip and jgraham January 23, 2024 02:46

jonathan-j-lee requested a review from WeizhongX January 23, 2024 02:47

jonathan-j-lee mentioned this pull request Jan 29, 2024

[wptrunner] Add results for expected subtests that didn't run #44125

Closed

jonathan-j-lee closed this Jan 30, 2024

jonathan-j-lee deleted the wptrunner/no-check-incomplete-subtests branch January 30, 2024 18:17

jonathan-j-lee mentioned this pull request Feb 6, 2024

[wptrunner] Plumb expectations wpttest.{Test -> (Subtest)Result} #44424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wptrunner] Add option to skip subtest checks for incomplete tests #44134

[wptrunner] Add option to skip subtest checks for incomplete tests #44134

jonathan-j-lee commented Jan 23, 2024

foolip commented Jan 23, 2024

WeizhongX commented Jan 23, 2024

jgraham commented Jan 26, 2024

WeizhongX commented Jan 26, 2024

WeizhongX commented Jan 30, 2024

jonathan-j-lee commented Jan 30, 2024

WeizhongX commented Jan 31, 2024 •

edited

Loading

WeizhongX commented Jan 31, 2024

WeizhongX commented Jan 31, 2024

jonathan-j-lee commented Jan 31, 2024

WeizhongX commented Jan 31, 2024 •

edited

Loading

jonathan-j-lee commented Jan 31, 2024

[wptrunner] Add option to skip subtest checks for incomplete tests #44134

[wptrunner] Add option to skip subtest checks for incomplete tests #44134

Conversation

jonathan-j-lee commented Jan 23, 2024

foolip commented Jan 23, 2024

WeizhongX commented Jan 23, 2024

jgraham commented Jan 26, 2024

WeizhongX commented Jan 26, 2024

WeizhongX commented Jan 30, 2024

jonathan-j-lee commented Jan 30, 2024

WeizhongX commented Jan 31, 2024 • edited Loading

WeizhongX commented Jan 31, 2024

WeizhongX commented Jan 31, 2024

jonathan-j-lee commented Jan 31, 2024

WeizhongX commented Jan 31, 2024 • edited Loading

jonathan-j-lee commented Jan 31, 2024

WeizhongX commented Jan 31, 2024 •

edited

Loading

WeizhongX commented Jan 31, 2024 •

edited

Loading