Add "expects" (not just "asserts") to testharness #9180

henbos · 2018-01-24T12:04:28Z

In WPT it would be useful to expect something - resulting in a PASS or FAIL line in the -expected.txt - without necessarily aborting the test if it FAILs like "asserts" do.

For example, in WebRTC, features are often shipped with partial support. In particular, WebRTC has gone through three stages for how to handle media - "addStream & getLocalStreams" (deprecated), "addTrack & getSenders" (current) and now also "addTransceiver & getTransceivers" (recent addition).

I'm simplifying it here for the sake of argument, but the majority of behaviors can be tested with just "was a track added?"-type of tests, but a full spec-compliant test is "was a track and a transceiver added?"-type of tests.

When an "assert" aborts the test we lose all the test coverage for the related behaviors. For example, Chrome fails all tests related to "ontrack" firing because the test for "ontrack fires when a track is added" (which is supported) also asserts "track events contain a transceiver" (Chrome does not support transceivers yet).

Another example is "when a track is added we get track stats" failing if "all stats related to tracks are fully implemented" fails.

In order to use WPT as our main test tool we have to either:

Design all tests to test particular behavior instead of testing all outcomes of an operation. This means a test tests as little as possible so only that particular behavior passes or fails, but it also means that many tests have to be copied and run multiple times looking at different sets of outcomes for the same operation.
Support a way not to abort a test on assert failure. E.g. add "expect" operations to testharness.js. A single test can produce multiple -expected.txt lines, both PASS and FAIL.
Avoid asserting everything we would like to assert until it is implemented.

I've focused on 1) and rewrote a large amount of tests related to tracks. Being able to do 2) would be nice in some cases. I've seen examples of 3) but this is bad because you end up with in-code TODOs and limiting what is tested based on particular browser implementations.

henbos · 2018-01-24T12:04:57Z

FYI @foolip @alvestrand we talked about this, I filed this bug.

foolip · 2018-01-26T04:22:44Z

@henbos, would this pattern work for you?

async_test(t => {
  var track = something;
  track.onload = t.step_func(event => {
   assert_stuff();
    test(() => {
      assert_things_about(event.transceiver);
    }, 'inner test');
    assert_more_stuff();
  });
}, 'outer test');

It would show up as two tests, but if the transceiver bits fail the outer test will still continue.

henbos · 2018-01-26T08:44:27Z

Oh that's interesting! That would probably do it, yes.

foolip · 2018-01-26T08:54:12Z

@henbos, can you try it and see if it's good enough? Trouble with adding "expects" is that we'd have to figure out how to report at least failures separately, and this does achieve that.

I'm going to label this is roadmap, and if when it's revisited in 60 days it seems good enough, we'll close it.

jugglinmike · 2018-02-09T23:57:52Z

I advise against authoring tests in such a way where the number of sub-tests is dependent on the implementation status. Doing so makes results very hard to interpret:

Within implementations: someone vetting implementation progress over time will be in for a surprise when their pass/fail rate decreases following the completion of some feature.
Between implementations: web developers trying to understand support for a given feature across the web platform will have a hard time deriving meaning from results like "5 out of 10", "20 out of 50" and "7 out of 30".

The number of subtests can generally be expected to increase with each commit made to WPT, so "correctness" is always a moving target. But for the use cases above, we're holding the WPT revision as constant. It's important that "number of subtests" is a consistent denominator in these cases. That way, everyone shares the same expectations.

Of the alternatives listed by @henbos, the first seems most conducive for both of those use cases. Writing multiple tests that reach some intermediate state before testing different expectations may involve more code, but any maintainability concerns for this practice are traditionally addressed through the definition of "setup" functions.

To be honest, I don't know if the pattern @foolip suggested is used frequently in WPT. It may be that the ship is already sailed on this. But it's something I'd be interested in investigating in the coming weeks, especially in terms of the https://wpt.fyi project.

foolip · 2018-02-23T11:16:54Z

#8894 is an example of this being done in practice, which I had no hand in encouraging, FWIW. In this case, the number of subtests doesn't depend on implementation status.

However, I also think the ship has pretty much sailed on achieving a consistent number of subtests irrespective of implementation status, it is just too easy to write tests that fail to achieve this. My rule of thumb is rather that there should be some failing test if all is not well. If fixing that test causes new failing tests to appear, that may be surprising, but if there is a true dependency, the alternative is to set up all test up front and fail them in the right places (or just rely on timeout, making it slower).

With wpt.fyi, since we can take the union of all tests and subtests, I think we have a path forward for making such comparisons that still allows writing tests with dependencies in the most straight-forward manner. I for one would rather LGTM a single test with such a dependency than two with a shared setup code if that setup code is async and not super fast.

But it's up to the people maintaining each part of WPT to make a judgement on this, I don't think we need to have a policy as such.

henbos · 2018-02-23T14:28:19Z

It should be easy to make subtests fail if not supported rather than skipping them entirely, it's just a matter of where to place the prerequisite condition - in the subtest or outside of it.
This is worth discussing but this worry is not an argument against subtests in principle.

henbos · 2018-02-23T14:30:08Z

Sorry the event might not fire, so it is a concern. But can this be solved with a helper utility function maybe? Hmm

henbos · 2018-02-23T14:33:15Z

Is there anything preventing the design of something like?

subtest(track.onload, () => { ... }, 'inner test 1');
subtest(track.onload, () => { ... }, 'inner test 2');
subtest(track.onload, () => { ... }, 'inner test 3');

As long as you don't have subtests inside of subtests...

henbos · 2018-02-23T14:38:30Z

Thinking out loud - how about something like this?

// Creates a handler for the subtest and wires up some mechanism to make sure that
// if the test times out, any subtest that has not executed produces a FAIL line at
// the end.
let subtest1 = subtest('inner test 1');

async_test(() => {
  ...
  track.onload = () => {
    subtest1.run = () => {
      assert_blah();
    }
  }
},
'test')

foolip · 2018-02-24T06:38:48Z

@henbos, those patterns already work, except it's not called subtest but just test, async_test or promise_test. You can do test2 = async_test('bla') upfront, or create a test from within another, nesting works. It's not very obvious that it should work though, so documenting such uses would be a good idea.

foolip · 2018-02-24T06:40:57Z

That's what I was thinking about with "set up all test up front". Again, the downside is that you have to explicitly cause then to fail later if they can't be run, or it'll be a timeout. I tend to think just creating the tests at the end of another is better overall.

jgraham · 2018-04-23T16:26:14Z

I think the conclusion is that this is supported by existing constructs using the right patterns?

zcorpan · 2018-09-28T16:54:58Z

Reopening to reconsider this per

https://groups.google.com/a/chromium.org/forum/m/#!topic/ecosystem-infra/HuqN39mcBNk

youennf · 2018-10-15T22:55:57Z

There has been some similar concerns in WebKit since this is one thing that js-test harness is supporting. Wrapping every assertion in a test() would do the trick.

youennf · 2018-10-23T13:53:23Z

Talking with @jgraham, we might want to promote this issue as RFC issue once there will be RFC issues.

foolip · 2018-10-23T22:52:41Z

@youennf, I think that would make sense, yes. https://github.com/foolip/rfcs is the proposal for how that will look. A concrete change proposal to evaluate would be great even before that process is in place, though.

mdittmer · 2019-01-25T15:14:06Z

friendly ping. @foolip it looks like this is priority:roadmap but has no owner. Any volunteers?

foolip · 2019-01-25T16:57:44Z

Downgrading. web-platform-tests/rfcs#1 is in progress and then that might be the forum to hammer this out, and then it'll need a champion who isn't me :)

foolip added infra testharness.js labels Jan 25, 2018

foolip added the priority:roadmap label Jan 26, 2018

jugglinmike mentioned this issue Feb 13, 2018

Perform a health/sanity check on results before serving them live web-platform-tests/results-collection#234

Closed

jugglinmike mentioned this issue Apr 5, 2018

Research: Why do non-Sauce Labs browsers report fewer than 100% of results? web-platform-tests/results-collection#478

Open

jgraham closed this as completed Apr 23, 2018

jugglinmike mentioned this issue May 29, 2018

[Resource-Timing] Fix initiatorType values #10840

Merged

zcorpan reopened this Sep 28, 2018

foolip added priority:backlog and removed priority:roadmap labels Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "expects" (not just "asserts") to testharness #9180

Add "expects" (not just "asserts") to testharness #9180

henbos commented Jan 24, 2018

henbos commented Jan 24, 2018

foolip commented Jan 26, 2018 •

edited

Loading

henbos commented Jan 26, 2018

foolip commented Jan 26, 2018

jugglinmike commented Feb 9, 2018

foolip commented Feb 23, 2018

henbos commented Feb 23, 2018

henbos commented Feb 23, 2018

henbos commented Feb 23, 2018

henbos commented Feb 23, 2018

foolip commented Feb 24, 2018

foolip commented Feb 24, 2018

jgraham commented Apr 23, 2018

zcorpan commented Sep 28, 2018

youennf commented Oct 15, 2018

youennf commented Oct 23, 2018

foolip commented Oct 23, 2018

mdittmer commented Jan 25, 2019

foolip commented Jan 25, 2019

Add "expects" (not just "asserts") to testharness #9180

Add "expects" (not just "asserts") to testharness #9180

Comments

henbos commented Jan 24, 2018

henbos commented Jan 24, 2018

foolip commented Jan 26, 2018 • edited Loading

henbos commented Jan 26, 2018

foolip commented Jan 26, 2018

jugglinmike commented Feb 9, 2018

foolip commented Feb 23, 2018

henbos commented Feb 23, 2018

henbos commented Feb 23, 2018

henbos commented Feb 23, 2018

henbos commented Feb 23, 2018

foolip commented Feb 24, 2018

foolip commented Feb 24, 2018

jgraham commented Apr 23, 2018

zcorpan commented Sep 28, 2018

youennf commented Oct 15, 2018

youennf commented Oct 23, 2018

foolip commented Oct 23, 2018

mdittmer commented Jan 25, 2019

foolip commented Jan 25, 2019

foolip commented Jan 26, 2018 •

edited

Loading