[stability] Report duplicate tests as "excess" #5305

jugglinmike · 2017-03-31T19:59:28Z

Previously, the "stability checker" script would report results for
duplicate test names in the same format as results for non-duplicate test
names, without any additional commentary. For example, a test run for a
test that included two stable subtests with the same name ("test #1") and
one unstable subtest ("test #2") might produce the following output:

|       Test         |   Subtest |         Results            |                Messages                |
|--------------------|-----------|----------------------------|----------------------------------------|
| `/infra/demo.html` | `test #1` | **PASS: 20/10**            |                                        |
| `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |

Without proper context, these results caused confusion for contributors.

Extend the rendered output to communicate the reason for failure more
directly by including the count of tests in excess, e.g.:

|       Test         |   Subtest |         Results            |                Messages                |
|--------------------|-----------|----------------------------|----------------------------------------|
| `/infra/demo.html` | `test #1` | **PASS: 20/10**, EXCESS:10 |                                        |
| `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |

Previously, the "stability checker" script would report results for duplicate test names in the same format as results for non-duplicate test names, without any additional commentary. For example, a test run for a test that included two stable subtests with the same name ("test #1") and one unstable subtest ("test #2") might produce the following output: | Test | Subtest | Results | Messages | |--------------------|-----------|----------------------------|----------------------------------------| | `/infra/demo.html` | `test #1` | **PASS: 20/10** | | | `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` | Without proper context, these results caused confusion for contributors. Extend the rendered output to communicate the reason for failure more directly by including the count of tests in excess, e.g.: | Test | Subtest | Results | Messages | |--------------------|-----------|----------------------------|----------------------------------------| | `/infra/demo.html` | `test #1` | **PASS: 20/10**, EXCESS:10 | | | `/infra/demo.html` | `test #2` | **FAIL: 5/10, PASS: 5/10** | `assert_true: expected true got false` |

This commit is intended for demonstration purposes only and should *not* be merged to `master`.

ghost · 2017-03-31T20:10:11Z

View the complete job log.

Firefox (nightly channel)

Testing web-platform-tests at revision 1c54819
Using browser at version BuildID 20170331102157; SourceStamp 8df9fabf2587b7020889755acb9e75b664fe13cf
Starting 10 test iterations

Unstable results

The following table lists tests that exhibited inconsistent results after 10 iterations. The label MISSING indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label EXCESS indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names.

Test	Subtest	Results	Messages
`/infra/demo.html`	`duplicate`	PASS: 20/10, EXCESS: 10
`/infra/demo.html`	`ghost`	PASS: 8/10, MISSING: 2/10
`/infra/demo.html`	`flakey`	FAIL: 3/10, PASS: 7/10	`assert_true: expected true got false`

All results

2 tests ran

/infra/demo.html

Subtest	Results	Messages
	ERROR
`duplicate`	PASS: 20/10, EXCESS: 10
`ghost`	PASS: 8/10, MISSING: 2/10
`flakey`	FAIL: 3/10, PASS: 7/10	`assert_true: expected true got false`
`always pass`	PASS
`always fail`	FAIL	`assert_true: expected true got false`

/service-workers/service-worker/navigation-preload/empty-preload-response-body.https.html

Subtest	Results	Messages
	OK
`Navigation Preload empty response body.`	FAIL	`assert_equals: expected "[]" but got ""`

ghost · 2017-03-31T20:10:12Z

View the complete job log.

Chrome (unstable channel)

Testing web-platform-tests at revision 1c54819
Using browser at version 59.0.3053.3 dev
Starting 10 test iterations

Unstable results

The following table lists tests that exhibited inconsistent results after 10 iterations. The label MISSING indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label EXCESS indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names.

Test	Subtest	Results	Messages
`/infra/demo.html`	`duplicate`	PASS: 20/10, EXCESS: 10
`/infra/demo.html`	`ghost`	PASS: 5/10, MISSING: 5/10
`/infra/demo.html`	`flakey`	FAIL: 5/10, PASS: 5/10	`assert_true: expected true got false`

All results

2 tests ran

/infra/demo.html

Subtest	Results	Messages
	ERROR
`duplicate`	PASS: 20/10, EXCESS: 10
`ghost`	PASS: 5/10, MISSING: 5/10
`flakey`	FAIL: 5/10, PASS: 5/10	`assert_true: expected true got false`
`always pass`	PASS
`always fail`	FAIL	`assert_true: expected true got false`

/service-workers/service-worker/navigation-preload/empty-preload-response-body.https.html

Subtest	Results	Messages
	OK
`Navigation Preload empty response body.`	FAIL	`assert_equals: expected "[]" but got ""`

gsnedders · 2017-03-31T21:06:41Z

We should have something somewhere about "hey here's what normally causes excess results and here's what normally causes missing results", whether that's inline or linked from the comment.

ghost · 2017-03-31T22:54:43Z

These tests are now available on w3c-test.org

jugglinmike · 2017-03-31T23:08:05Z

@gsnedders excellent idea. I've decided to add a paragraph to explain the table a bit--see above. What do you think?

jgraham · 2017-04-06T18:10:19Z

I think adding this message to every log is too much and people will ignore it. Maybe something like <span title="Excessive results are usually caused by duplicate test names.">PASS 20/10</span>?

jugglinmike · 2017-04-17T17:55:54Z

I think adding this message to every log is too much and people will ignore
it. Maybe something like <span title="Excessive results are usually caused by duplicate test names.">PASS 20/10</span>?

That sounds a little too unobtrusive. I don't think anyone will find it. What
do you think about using another <details> element, as in:

Explanation

The following table lists tests that exhibited inconsistent results after 10 iterations. The label `MISSING` indicates that the referenced test ran fewer than 10 times. This can occur if tests are generated via logic that is non-deterministic. The label `EXCESS` indicates that the referenced test ran more than 10 times. This may also be due to non-determinism in test generation logic, but it is more likely the result of duplicated test names.

jgraham · 2017-04-20T17:12:10Z

I think that's still rather wordy. I think a details dropdown like

<details>
<summary>Details</summary>
Excessive results are usually caused by duplicate test names.
</details>

would probably be reasonable.

gsnedders · 2017-11-29T14:21:23Z

I hadn't realised we never landed this. @jugglinmike you're back to working on WPT next week, right? Any chance you could have a go at rebasing this and dealing with the above?

gsnedders · 2019-10-30T16:52:45Z

Closed by #15330.

jugglinmike · 2019-10-30T17:07:04Z

Great!

jugglinmike added 2 commits March 31, 2017 15:57

DEBUG Demonstrate stability checker output

c442637

This commit is intended for demonstration purposes only and should *not* be merged to `master`.

jugglinmike added the infra label Mar 31, 2017

jugglinmike mentioned this pull request Mar 31, 2017

Submitting ported Microsoft custom props test suite #5288

Merged

fixup! [stability] Report duplicate tests as "excess"

3199b1b

gsnedders added the ci_stability label Apr 6, 2017

wpt-pr-bot requested a review from jgraham November 29, 2017 14:21

gsnedders closed this Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stability] Report duplicate tests as "excess" #5305

[stability] Report duplicate tests as "excess" #5305

jugglinmike commented Mar 31, 2017 •

edited by foolip

Loading

ghost commented Mar 31, 2017 •

edited by ghost

Loading

ghost commented Mar 31, 2017 •

edited by ghost

Loading

gsnedders commented Mar 31, 2017

ghost commented Mar 31, 2017

jugglinmike commented Mar 31, 2017

jgraham commented Apr 6, 2017

jugglinmike commented Apr 17, 2017

jgraham commented Apr 20, 2017

gsnedders commented Nov 29, 2017

gsnedders commented Oct 30, 2019 •

edited

Loading

jugglinmike commented Oct 30, 2019

[stability] Report duplicate tests as "excess" #5305

[stability] Report duplicate tests as "excess" #5305

Conversation

jugglinmike commented Mar 31, 2017 • edited by foolip Loading

ghost commented Mar 31, 2017 • edited by ghost Loading

Firefox (nightly channel)

Unstable results

All results

ghost commented Mar 31, 2017 • edited by ghost Loading

Chrome (unstable channel)

Unstable results

All results

gsnedders commented Mar 31, 2017

ghost commented Mar 31, 2017

jugglinmike commented Mar 31, 2017

jgraham commented Apr 6, 2017

jugglinmike commented Apr 17, 2017

jgraham commented Apr 20, 2017

gsnedders commented Nov 29, 2017

gsnedders commented Oct 30, 2019 • edited Loading

jugglinmike commented Oct 30, 2019

jugglinmike commented Mar 31, 2017 •

edited by foolip

Loading

ghost commented Mar 31, 2017 •

edited by ghost

Loading

ghost commented Mar 31, 2017 •

edited by ghost

Loading

gsnedders commented Oct 30, 2019 •

edited

Loading