-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stability] Report duplicate tests as "excess" #5305
[stability] Report duplicate tests as "excess" #5305
Conversation
Previously, the "stability checker" script would report results for duplicate test names in the same format as results for non-duplicate test names, without any additional commentary. For example, a test run for a test that included two stable subtests with the same name ("test #1") and one unstable subtest ("test #2") might produce the following output: | Test | Subtest | Results | Messages | |--------------------|-----------|----------------------------|----------------------------------------| | `/infra/demo.html` | `test #1` | **PASS: 20/10** | | | `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` | Without proper context, these results caused confusion for contributors. Extend the rendered output to communicate the reason for failure more directly by including the count of tests in excess, e.g.: | Test | Subtest | Results | Messages | |--------------------|-----------|----------------------------|----------------------------------------| | `/infra/demo.html` | `test #1` | **PASS: 20/10**, EXCESS:10 | | | `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |
This commit is intended for demonstration purposes only and should *not* be merged to `master`.
Firefox (nightly channel)Testing web-platform-tests at revision 1c54819 Unstable resultsThe following table lists tests that exhibited inconsistent results after 10 iterations. The label
All results2 tests ran/infra/demo.html
/service-workers/service-worker/navigation-preload/empty-preload-response-body.https.html
|
Chrome (unstable channel)Testing web-platform-tests at revision 1c54819 Unstable resultsThe following table lists tests that exhibited inconsistent results after 10 iterations. The label
All results2 tests ran/infra/demo.html
/service-workers/service-worker/navigation-preload/empty-preload-response-body.https.html
|
We should have something somewhere about "hey here's what normally causes excess results and here's what normally causes missing results", whether that's inline or linked from the comment. |
These tests are now available on w3c-test.org |
@gsnedders excellent idea. I've decided to add a paragraph to explain the table a bit--see above. What do you think? |
I think adding this message to every log is too much and people will ignore it. Maybe something like |
That sounds a little too unobtrusive. I don't think anyone will find it. What ExplanationThe following table lists tests that exhibited inconsistent results after 10 iterations. The label `MISSING` indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label `EXCESS` indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names. |
I think that's still rather wordy. I think a details dropdown like
would probably be reasonable. |
I hadn't realised we never landed this. @jugglinmike you're back to working on WPT next week, right? Any chance you could have a go at rebasing this and dealing with the above? |
Closed by #15330. |
Great! |
Previously, the "stability checker" script would report results for
duplicate test names in the same format as results for non-duplicate test
names, without any additional commentary. For example, a test run for a
test that included two stable subtests with the same name ("
test #1
") andone unstable subtest ("
test #2
") might produce the following output:Without proper context, these results caused confusion for contributors.
Extend the rendered output to communicate the reason for failure more
directly by including the count of tests in excess, e.g.: