-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wptrunner] Add option to skip subtest checks for incomplete tests #44134
[wptrunner] Add option to skip subtest checks for incomplete tests #44134
Conversation
Some testharness/wdspec tests can racily time out on different subtests between runs, which makes writing stable subtest `TIMEOUT` and `NOTRUN` expectations that won't flake difficult. `--no-check-incomplete-subtests` discards subtest results for tests that time out or crash, meaning only the harness-level statuses contribute to determining whether those tests ran expectedly. Subtests are always checked if the harness is `OK`/`ERROR`, since their subtest results are generally more stable. See also: https://crrev.com/c/5112639
@jonathan-j-lee is the important part here that we don't fail (non-zero exit code) in these cases, or that we don't include the results in the report? If the report is the important part, I'm wondering if it would be an option to put the logic in the code that interprets the report? I suspect not, but want to make sure. |
I had been thinking changing the way The change in this PR is trying to discard all subtest results when needed. The question would be will the results be useful sometimes? We might also need to find a better name for |
We have also seen this problem, but I'm not a big fan of dealing with it this way. It means that if you have a test with, say, 5 subtests that consistently pass and then two subtests which can run in a random order but where both timeout (resulting in an overall test timeout, but a combination of TIMEOUT/NOTRUN results for the two subtests), you end up throwing away the results of the 5 stable tests. Similarly if you have a test with a large number of subtests and you have, say 900 stable results but the overall test times out somewhere between tests 900 and 1000. I definitely wouldn't be in favour of making this behaviour into the default / only behaviour for tests that timeout or crash. The ideal situation is of course that we identify and fix the problematic tests. I can imagine a different rules that would be more specific in dealing with this situation e.g. something if the overall test times out and the subtest has a status of TIMEOUT or NOTRUN, then we consider that a known intermittent by adding the actual status to the expected statuses. But I'm not sure that works well and I don't love adding that kind of special case that makes the overall logic of the system harder to reason about and understand. |
jgraham@, thanks for the feedback. |
I think maybe we should accept what jgraham@ suggested, i.e. when test harness tests expectedly Timeout, the baseline for that test harness test still matters, and a mismatch in baseline would cause the test to fail. This would diverge from the behavior in RWT and we should document that. This might also require a change to the rebaseline tool. |
Ok, sounds good. I tried that in https://crrev.com/c/5112639/15, but it seemed to require many (70+) baselines. |
I later realized there might still be some issues. When a chromium gardener see a test harness test times out on some CI builders, and s/he decided to add a Timeout expectation for the test. This would not make the test to run as expected Timeout, instead it turns the Timeout to a Failure, and the gardener or another one would need to add a Failure test expectation for the test again.
jgraham@, do you think if we could implement the above as you suggested? We could add lots of comments to explain the code. And I think we only need to handle the case when the overall result is Timeout or Crash. I am not in favor of another command lin argument for this. We already get too many command line arguments. |
jgraham@, looks like the approach you mentioned is something in the middle: we will still catch a change from PASS => FAIL, but won't for a PASS => TIMEOUT. But for the simplicity of the system, might be worth the tradeoff. |
I think we should move the code that determines if a test run as expected to a separate function. We could then override that function in Chromium. In this way we do not need to change upstream behaviors. |
Currently, Chromium's wptrunner wrapper has expectation logic that augments:
to:
This should be equivalent to what @jgraham proposed, except the I think the main question we need to answer is whether we care about unexpected It might make sense to accept a difference in behavior where, for wptrunner-run tests, failures preceding a timeout need to be suppressed separately from adding the timeout expectation. Since most tests pass, this hopefully shouldn't be too onerous for the build gardeners, and requiring explicit failure expectations seem better anyway for tracking by test owners. |
Agreed
I think I changed my position here due to what jgraham@ said above, and that is why I said we need to document that.
Do you think if we can do this? If we can do this, we can stick with RWT's behavior. |
I think so (with some refactoring). The main obstacle is that wpt/tools/wptrunner/wptrunner/testrunner.py Lines 694 to 695 in 231f825
Instead, you'd want to plumb the expectations through wpt/tools/wptrunner/wptrunner/wpttest.py Lines 28 to 29 in 231f825
... then give the monkey-patched |
This allows vendors to override expectations at runtime via browser-specific result converters. Callers that want to simply pass through the testloader-read expectations should use the newly introduced `Test.make_{subtest_,}result()` to construct `(Subtest)Result`s instead of invoking the constructor directly (each pair has the same signature). This is a pure refactor; no functional changes intended. Successor to: web-platform-tests#44134 (comment)
This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests#44134 (comment)
…44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: #44134 (comment)
This matches the `run_web_tests.py` behavior. `run_wpt_tests.py` previously tried to emulate this in the expectation translation layer, but handling subtest results on test timeout is fundamentally a decision that can only be made at runtime. Hook into the testrunner machinery to do so. See also: web-platform-tests/wpt#44134 Bug: 40943761 Cq-Include-Trybots: luci.chromium.try:linux-wpt-chromium-rel Change-Id: I70e5e02c20dab56cbdaa7c376242b32b8806e17a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5251248 Reviewed-by: Weizhong Xia <[email protected]> Commit-Queue: Jonathan Lee <[email protected]> Cr-Commit-Position: refs/heads/main@{#1257694}
…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424
…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424
…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424 UltraBlame original commit: b53534ad1edfde1911bd2d17ce4526db588f2942
…eb-platform-tests#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests#44134 (comment)
…{Test -> (Subtest)Result}`, a=testonly Automatic update from web-platform-tests [wptrunner] Plumb expectations `wpttest.{Test -> (Subtest)Result}` (#44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: web-platform-tests/wpt#44134 (comment) -- wpt-commits: a68f313a50899795e1e6660b7398544d340f1652 wpt-pr: 44424 UltraBlame original commit: b53534ad1edfde1911bd2d17ce4526db588f2942
…44424) This allows vendors to override expectations at runtime via browser-specific result converters. Instead of constructing `(Subtest)Result`s directly, executors should use the newly introduced `Test.make_{subtest_,}result()`, which have the same signatures but default to passing through the testloader-read expectations. This is a pure refactor; no functional changes intended. Successor to: #44134 (comment)
Some testharness/wdspec tests can racily time out on different subtests between runs, which makes writing stable subtest
TIMEOUT
andNOTRUN
expectations that won't flake difficult.--no-check-incomplete-subtests
discards subtest results for tests that time out or crash, meaning only the harness-level statuses contribute to determining whether those tests ran expectedly. Subtests are always checked if the harness isOK
/ERROR
, since their subtest results are generally more stable.See also: https://crrev.com/c/5112639