Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stability] Report duplicate tests as "excess" #5305

Closed

Conversation

jugglinmike
Copy link
Contributor

@jugglinmike jugglinmike commented Mar 31, 2017

Previously, the "stability checker" script would report results for
duplicate test names in the same format as results for non-duplicate test
names, without any additional commentary. For example, a test run for a
test that included two stable subtests with the same name ("test #1") and
one unstable subtest ("test #2") might produce the following output:

|       Test         |   Subtest |         Results            |                Messages                |
|--------------------|-----------|----------------------------|----------------------------------------|
| `/infra/demo.html` | `test #1` | **PASS: 20/10**            |                                        |
| `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |

Without proper context, these results caused confusion for contributors.

Extend the rendered output to communicate the reason for failure more
directly by including the count of tests in excess, e.g.:

|       Test         |   Subtest |         Results            |                Messages                |
|--------------------|-----------|----------------------------|----------------------------------------|
| `/infra/demo.html` | `test #1` | **PASS: 20/10**, EXCESS:10 |                                        |
| `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |

Previously, the "stability checker" script would report results for
duplicate test names in the same format as results for non-duplicate test
names, without any additional commentary. For example, a test run for a
test that included two stable subtests with the same name ("test #1") and
one unstable subtest ("test #2") might produce the following output:

    |       Test         |   Subtest |         Results            |                Messages                |
    |--------------------|-----------|----------------------------|----------------------------------------|
    | `/infra/demo.html` | `test #1` | **PASS: 20/10**            |                                        |
    | `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |

Without proper context, these results caused confusion for contributors.

Extend the rendered output to communicate the reason for failure more
directly by including the count of tests in excess, e.g.:

    |       Test         |   Subtest |         Results            |                Messages                |
    |--------------------|-----------|----------------------------|----------------------------------------|
    | `/infra/demo.html` | `test #1` | **PASS: 20/10**, EXCESS:10 |                                        |
    | `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |
This commit is intended for demonstration purposes only and should *not*
be merged to `master`.
@ghost
Copy link

ghost commented Mar 31, 2017

View the complete job log.

Firefox (nightly channel)

Testing web-platform-tests at revision 1c54819
Using browser at version BuildID 20170331102157; SourceStamp 8df9fabf2587b7020889755acb9e75b664fe13cf
Starting 10 test iterations

Unstable results

The following table lists tests that exhibited inconsistent results after 10 iterations. The label MISSING indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label EXCESS indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names.

Test Subtest Results Messages
/infra/demo.html duplicate PASS: 20/10, EXCESS: 10
/infra/demo.html ghost PASS: 8/10, MISSING: 2/10
/infra/demo.html flakey FAIL: 3/10, PASS: 7/10 assert_true: expected true got false

All results

2 tests ran
/infra/demo.html
Subtest Results Messages
ERROR
duplicate PASS: 20/10, EXCESS: 10
ghost PASS: 8/10, MISSING: 2/10
flakey FAIL: 3/10, PASS: 7/10 assert_true: expected true got false
always pass PASS
always fail FAIL assert_true: expected true got false
/service-workers/service-worker/navigation-preload/empty-preload-response-body.https.html
Subtest Results Messages
OK
Navigation Preload empty response body. FAIL assert_equals: expected "[]" but got ""

@ghost
Copy link

ghost commented Mar 31, 2017

View the complete job log.

Chrome (unstable channel)

Testing web-platform-tests at revision 1c54819
Using browser at version 59.0.3053.3 dev
Starting 10 test iterations

Unstable results

The following table lists tests that exhibited inconsistent results after 10 iterations. The label MISSING indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label EXCESS indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names.

Test Subtest Results Messages
/infra/demo.html duplicate PASS: 20/10, EXCESS: 10
/infra/demo.html ghost PASS: 5/10, MISSING: 5/10
/infra/demo.html flakey FAIL: 5/10, PASS: 5/10 assert_true: expected true got false

All results

2 tests ran
/infra/demo.html
Subtest Results Messages
ERROR
duplicate PASS: 20/10, EXCESS: 10
ghost PASS: 5/10, MISSING: 5/10
flakey FAIL: 5/10, PASS: 5/10 assert_true: expected true got false
always pass PASS
always fail FAIL assert_true: expected true got false
/service-workers/service-worker/navigation-preload/empty-preload-response-body.https.html
Subtest Results Messages
OK
Navigation Preload empty response body. FAIL assert_equals: expected "[]" but got ""

@gsnedders
Copy link
Member

We should have something somewhere about "hey here's what normally causes excess results and here's what normally causes missing results", whether that's inline or linked from the comment.

@ghost
Copy link

ghost commented Mar 31, 2017

These tests are now available on w3c-test.org

@jugglinmike
Copy link
Contributor Author

@gsnedders excellent idea. I've decided to add a paragraph to explain the table a bit--see above. What do you think?

@jgraham
Copy link
Contributor

jgraham commented Apr 6, 2017

I think adding this message to every log is too much and people will ignore it. Maybe something like <span title="Excessive results are usually caused by duplicate test names.">PASS 20/10</span>?

@jugglinmike
Copy link
Contributor Author

I think adding this message to every log is too much and people will ignore
it. Maybe something like <span title="Excessive results are usually caused by duplicate test names.">PASS 20/10</span>?

That sounds a little too unobtrusive. I don't think anyone will find it. What
do you think about using another <details> element, as in:

Explanation The following table lists tests that exhibited inconsistent results after 10 iterations. The label `MISSING` indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label `EXCESS` indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names.

@jgraham
Copy link
Contributor

jgraham commented Apr 20, 2017

I think that's still rather wordy. I think a details dropdown like

<details>
<summary>Details</summary>
Excessive results are usually caused by duplicate test names.
</details>

would probably be reasonable.

@gsnedders
Copy link
Member

I hadn't realised we never landed this. @jugglinmike you're back to working on WPT next week, right? Any chance you could have a go at rebasing this and dealing with the above?

@wpt-pr-bot wpt-pr-bot requested a review from jgraham November 29, 2017 14:21
@gsnedders
Copy link
Member

gsnedders commented Oct 30, 2019

Closed by #15330.

@gsnedders gsnedders closed this Oct 30, 2019
@jugglinmike
Copy link
Contributor Author

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants