Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "expects" (not just "asserts") to testharness #9180

Open
henbos opened this issue Jan 24, 2018 · 19 comments
Open

Add "expects" (not just "asserts") to testharness #9180

henbos opened this issue Jan 24, 2018 · 19 comments

Comments

@henbos
Copy link
Contributor

henbos commented Jan 24, 2018

In WPT it would be useful to expect something - resulting in a PASS or FAIL line in the -expected.txt - without necessarily aborting the test if it FAILs like "asserts" do.

For example, in WebRTC, features are often shipped with partial support. In particular, WebRTC has gone through three stages for how to handle media - "addStream & getLocalStreams" (deprecated), "addTrack & getSenders" (current) and now also "addTransceiver & getTransceivers" (recent addition).

I'm simplifying it here for the sake of argument, but the majority of behaviors can be tested with just "was a track added?"-type of tests, but a full spec-compliant test is "was a track and a transceiver added?"-type of tests.

When an "assert" aborts the test we lose all the test coverage for the related behaviors. For example, Chrome fails all tests related to "ontrack" firing because the test for "ontrack fires when a track is added" (which is supported) also asserts "track events contain a transceiver" (Chrome does not support transceivers yet).

Another example is "when a track is added we get track stats" failing if "all stats related to tracks are fully implemented" fails.

In order to use WPT as our main test tool we have to either:

  1. Design all tests to test particular behavior instead of testing all outcomes of an operation. This means a test tests as little as possible so only that particular behavior passes or fails, but it also means that many tests have to be copied and run multiple times looking at different sets of outcomes for the same operation.
  2. Support a way not to abort a test on assert failure. E.g. add "expect" operations to testharness.js. A single test can produce multiple -expected.txt lines, both PASS and FAIL.
  3. Avoid asserting everything we would like to assert until it is implemented.

I've focused on 1) and rewrote a large amount of tests related to tracks. Being able to do 2) would be nice in some cases. I've seen examples of 3) but this is bad because you end up with in-code TODOs and limiting what is tested based on particular browser implementations.

@henbos
Copy link
Contributor Author

henbos commented Jan 24, 2018

FYI @foolip @alvestrand we talked about this, I filed this bug.

@foolip
Copy link
Member

foolip commented Jan 26, 2018

@henbos, would this pattern work for you?

async_test(t => {
  var track = something;
  track.onload = t.step_func(event => {
   assert_stuff();
    test(() => {
      assert_things_about(event.transceiver);
    }, 'inner test');
    assert_more_stuff();
  });
}, 'outer test');

It would show up as two tests, but if the transceiver bits fail the outer test will still continue.

@henbos
Copy link
Contributor Author

henbos commented Jan 26, 2018

Oh that's interesting! That would probably do it, yes.

@foolip
Copy link
Member

foolip commented Jan 26, 2018

@henbos, can you try it and see if it's good enough? Trouble with adding "expects" is that we'd have to figure out how to report at least failures separately, and this does achieve that.

I'm going to label this is roadmap, and if when it's revisited in 60 days it seems good enough, we'll close it.

@jugglinmike
Copy link
Contributor

I advise against authoring tests in such a way where the number of sub-tests is dependent on the implementation status. Doing so makes results very hard to interpret:

  • Within implementations: someone vetting implementation progress over time will be in for a surprise when their pass/fail rate decreases following the completion of some feature.
  • Between implementations: web developers trying to understand support for a given feature across the web platform will have a hard time deriving meaning from results like "5 out of 10", "20 out of 50" and "7 out of 30".

The number of subtests can generally be expected to increase with each commit made to WPT, so "correctness" is always a moving target. But for the use cases above, we're holding the WPT revision as constant. It's important that "number of subtests" is a consistent denominator in these cases. That way, everyone shares the same expectations.

Of the alternatives listed by @henbos, the first seems most conducive for both of those use cases. Writing multiple tests that reach some intermediate state before testing different expectations may involve more code, but any maintainability concerns for this practice are traditionally addressed through the definition of "setup" functions.

To be honest, I don't know if the pattern @foolip suggested is used frequently in WPT. It may be that the ship is already sailed on this. But it's something I'd be interested in investigating in the coming weeks, especially in terms of the https://wpt.fyi project.

@foolip
Copy link
Member

foolip commented Feb 23, 2018

#8894 is an example of this being done in practice, which I had no hand in encouraging, FWIW. In this case, the number of subtests doesn't depend on implementation status.

However, I also think the ship has pretty much sailed on achieving a consistent number of subtests irrespective of implementation status, it is just too easy to write tests that fail to achieve this. My rule of thumb is rather that there should be some failing test if all is not well. If fixing that test causes new failing tests to appear, that may be surprising, but if there is a true dependency, the alternative is to set up all test up front and fail them in the right places (or just rely on timeout, making it slower).

With wpt.fyi, since we can take the union of all tests and subtests, I think we have a path forward for making such comparisons that still allows writing tests with dependencies in the most straight-forward manner. I for one would rather LGTM a single test with such a dependency than two with a shared setup code if that setup code is async and not super fast.

But it's up to the people maintaining each part of WPT to make a judgement on this, I don't think we need to have a policy as such.

@henbos
Copy link
Contributor Author

henbos commented Feb 23, 2018

It should be easy to make subtests fail if not supported rather than skipping them entirely, it's just a matter of where to place the prerequisite condition - in the subtest or outside of it.
This is worth discussing but this worry is not an argument against subtests in principle.

@henbos
Copy link
Contributor Author

henbos commented Feb 23, 2018

Sorry the event might not fire, so it is a concern. But can this be solved with a helper utility function maybe? Hmm

@henbos
Copy link
Contributor Author

henbos commented Feb 23, 2018

Is there anything preventing the design of something like?

subtest(track.onload, () => { ... }, 'inner test 1');
subtest(track.onload, () => { ... }, 'inner test 2');
subtest(track.onload, () => { ... }, 'inner test 3');

As long as you don't have subtests inside of subtests...

@henbos
Copy link
Contributor Author

henbos commented Feb 23, 2018

Thinking out loud - how about something like this?

// Creates a handler for the subtest and wires up some mechanism to make sure that
// if the test times out, any subtest that has not executed produces a FAIL line at
// the end.
let subtest1 = subtest('inner test 1');

async_test(() => {
  ...
  track.onload = () => {
    subtest1.run = () => {
      assert_blah();
    }
  }
},
'test')

@foolip
Copy link
Member

foolip commented Feb 24, 2018

@henbos, those patterns already work, except it's not called subtest but just test, async_test or promise_test. You can do test2 = async_test('bla') upfront, or create a test from within another, nesting works. It's not very obvious that it should work though, so documenting such uses would be a good idea.

@foolip
Copy link
Member

foolip commented Feb 24, 2018

That's what I was thinking about with "set up all test up front". Again, the downside is that you have to explicitly cause then to fail later if they can't be run, or it'll be a timeout. I tend to think just creating the tests at the end of another is better overall.

@jgraham
Copy link
Contributor

jgraham commented Apr 23, 2018

I think the conclusion is that this is supported by existing constructs using the right patterns?

@zcorpan
Copy link
Member

zcorpan commented Sep 28, 2018

@zcorpan zcorpan reopened this Sep 28, 2018
@youennf
Copy link
Contributor

youennf commented Oct 15, 2018

There has been some similar concerns in WebKit since this is one thing that js-test harness is supporting. Wrapping every assertion in a test() would do the trick.

@youennf
Copy link
Contributor

youennf commented Oct 23, 2018

Talking with @jgraham, we might want to promote this issue as RFC issue once there will be RFC issues.

@foolip
Copy link
Member

foolip commented Oct 23, 2018

@youennf, I think that would make sense, yes. https://github.com/foolip/rfcs is the proposal for how that will look. A concrete change proposal to evaluate would be great even before that process is in place, though.

@mdittmer
Copy link
Contributor

friendly ping. @foolip it looks like this is priority:roadmap but has no owner. Any volunteers?

@foolip
Copy link
Member

foolip commented Jan 25, 2019

Downgrading. web-platform-tests/rfcs#1 is in progress and then that might be the forum to hammer this out, and then it'll need a champion who isn't me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants