Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command line option -n to run a test by number #301

Closed
jtlapp opened this issue Jul 18, 2016 · 31 comments
Closed

Command line option -n to run a test by number #301

jtlapp opened this issue Jul 18, 2016 · 31 comments

Comments

@jtlapp
Copy link
Contributor

jtlapp commented Jul 18, 2016

I've written this and having it working but can't seem to get a test suite running. UPDATE: It's now working at the link below. Just waiting for decision on PR.

I absolutely love the simplicity of tape, but it asks me to do something that I find anathema: modify code to run a particular test and remember to remove my modifications when I'm done. If I forgot to remove the .only constraint, I may check it in. That's easy to fix. The worst scenario is unwittingly changing something in the test while it's in the editor. This happens.

So while I respect that some people are really comfortable with the .only feature, I suspect I'm not the only one who is really uncomfortable with it. Fortunately, there's an easy fix.

All we need is a -n command line option. If you leave the option off, everything runs as it has. If you include a simple -n, all the tests run as usual, except that a test number is prefixed to the title of each test.

To run a particular test, just place -nN on the command line, where N is the test number shown in the title.

This approach allows tape to satisfy die-hards like me who prefer not to make edits that shouldn't be checked in.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

I've written it already, and you can play with it. However, the test suite test/only-number.js does not work, and I have yet to figure out why. The data events aren't being emitted. Help is appreciated! You'll find it here: https://github.com/jtlapp/tape/tree/only-by-number

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

Okay, I found the problem. I load the tape module from the module in which the test in installed, in order to communicate the desired test number to it, but when testing the tape module itself, there is no tape module installed. Hmm....

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

The test suite is now working. You can see the proposal at https://github.com/jtlapp/tape/tree/only-by-number. Let me know if you'd like me to do a PR.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

Okay, I think I now have the readme updated and prettied for this proposal.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

For the record, my -n solution depends on my fix for issue #299.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

Here is how to modify faucet to support -n. https://github.com/jtlapp/faucet/tree/tape-options-support

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 18, 2016

-n also facilitates team development. I can now check code into a branch and ask someone, such as somebody remote, to help me debug a particular test number. However, I suppose if we were using different operating systems and multiple test files, there's a chance that the numbers may not correspond. I suspect they usually will, though.

(That is, another person wouldn't have to go find a particular file and test name. I could just give the number assigned from faucet. For confirmation, I could also give the test name, but they'd only need to look for that test name if running that test number didn't produce this name.)

(I guess this is also an argument for using .only though, because I could check in the .only to flag the particular test. We'd just have to remember to remove it and re-check the file in.)

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 23, 2016

I will be creating a publicly available module and would like the test suite based on tape. However, I'd also like the benefit of the -n option. So I'm wondering if a decision can be made on this in the near-term, or whether I should offer my version of tape under a new module name.

@ljharb
Copy link
Collaborator

ljharb commented Jul 23, 2016

Your point here is the most compelling one to me - although I still think there's a concern that people will rely on test numbers that can easily change.

@jtlapp how would -n work when not using the tape runner? ie, would there be any way to get this behavior when using node to invoke a test file?

@substack @Raynos, what do you think about tape -n?

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 23, 2016

Thanks @ljharb. I don't think we could support -n on node-run tests without having the test javascript itself explicitly assert that -n on the test should pass on to tape. Otherwise we can't know that -n doesn't conflict with some argument that the module being run might use.

@ljharb
Copy link
Collaborator

ljharb commented Jul 23, 2016

perhaps an environment variable that -n also sets, and then that tape itself respects?

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 23, 2016

I think that would work. Is something compelling you to want support outside of the tape command? Is there a reason one would have to type node instead of tape (or faucet)?

Developers might think, "Hey, I want to run by number, so I'll use tape rather than node."

Also, before committing my PR, be sure to look at my way of avoiding use of node globals. See lines 32 and 35 in https://github.com/jtlapp/tape/blob/only-by-number/bin/tape

@ljharb
Copy link
Collaborator

ljharb commented Jul 24, 2016

I prefer only using node in my projects and almost never use the tape runner. It would certainly be fine to only support that feature using the runner.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

Examining the TAP spec now, I'm seeing that we have to be careful with language we use to describe the -n feature. TAP clearly already calls for numbering individual tests. The spec variously refers to these numbers as "test point numbers" and "test numbers."

The tape module allows us to organize these test points into convenient groups. The TAP spec has no notion of these groupings. I see that tape accommodates this shortcoming by preceding each group with a TAP comment providing the group name.

The test readme appears to use the word 'test' exclusively to refer to a test group, rather than a test or test point of TAP. That might suggest it's okay to say that -n refers to a "test number" in the context of tape.

Except for one inconsistency. tape output concludes with a comment indicating the number of "tests". This is a tape addition, not part of the TAP spec. So tape is itself being inconsistent about what a "test" is.

In any case, I think we're best off being consistent with the TAP spec. That suggests:

  • -n identifies a "test group number" or just a "group number"
  • The readme should be revised to refer to "test groups" rather than "tests."

The one remaining problem with this is that tape seems have established a convention of beginning each group with the keyword test, as in test("test group name", ...). Interpreting this use of "test" as a verb seems to eliminate that ambiguity, but I don't know if that's what people do. I suspect that the verb interpretation allows us to live with this convention and yet still be consistent with TAP terminology.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

Alternatively, we call each "test" of TAP an "assertion" within tape, and leave the readme use of "test" alone. This would also allow -n to refer to a "test number."

I think the addition of -n forces us to directly address the appearance of two independent sets of numbers in the TAP output. We should say what each set of numbers is -- "test (point)" & "test group", or "assertion" & "test".

@ljharb
Copy link
Collaborator

ljharb commented Jul 30, 2016

I think I misread your original post. Yes, numbers are already printed out, and I assumed you were asking to identify those numbers.

Definitely we do not want to output any different or additional numbers - if we use any numbers, it should be the numbering that the TAP output already has.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

Except that it is not possible to run only one TAP test. It is only possible to run a tape group, which often consists of multiple TAP tests.

I'm also pointing out that the tape readme is using "test" in a very different sense from TAP. In the readme, each "test" refers to a group of TAP tests.

@ljharb
Copy link
Collaborator

ljharb commented Jul 30, 2016

yeah, that's a good point.

I'd think it would be OK to do -n 123 and run the entire test group that contains test 123?

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

Clever. I wonder if that wouldn't invite confusion or frustration, as the programmer repeatedly attempts to run just -n123 but keeps getting 120-130.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

Groups could also be given letter sequences, instead of numbers. A-Z, then AA-ZZ, etc.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

There's another problem with running the containing group. Once you run just that containing group, the test points get renumbered. Test point numbers don't have the appearance of being unique identifiers. I can see that even confusing me on occasion.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

Still another option: generate a hash for each test group, based on the test name. If we tracked hashes, we could even guarantee their uniqueness. Main drawback is that they are longer, so to run a particular test, we'll find ourselves copying and pasting, rather than just reciting a number.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 30, 2016

The purpose of -n is to make life easier. Hashes don't do that. Letters are workable, but I suspect numbers are easier to communicate. Here's one way to eliminate confusion with having two sets of numbers in the output:

  • Change the concluding line from # tests N to # assertions N.
  • Add a concluding line # test groups N which shows the number of test groups.

I think things could be made still less confusing by always including the test group number in the output (bracketed), so that the user isn't only periodically becoming aware of the second numbering scheme.

There's another argument for always showing the group number. If after running a test sequence I decide to rerun just one test, I don't have to rerun all the tests first with -n to figure out which test number it is.

Maybe this feature is better off in a fork after all.

@ljharb
Copy link
Collaborator

ljharb commented Jul 31, 2016

I think this was my initial confusion when you first requested this feature.

I think it's sufficiently complex that I agree it might be best left to a fork. I think that "by name" is probably good enough after #299 is fixed, ideally with the implementation suggested here.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 31, 2016

Okay, I'll get you the reference-based fix for .only and make a new module for the -n fork. Thanks for all your input along the way!

@ljharb
Copy link
Collaborator

ljharb commented Jul 31, 2016

@jtlapp if it's possible to provide event hooks in tape so that instead of a fork, you can just make a plugin/wrapper/complement, please let me know.

@ljharb ljharb closed this as completed Jul 31, 2016
@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 31, 2016

I don't see how it is possible to create a second module that people would load to change the behavior of an executable script.

We could add a second executable script. The tape source proper would need to support the options available to this script, so we would need to stick with my number-based test IDs to make it work. But I'm also planning to add a -s option to stop after the first failed test (group).

A second executable script has the advantage of giving everyone who is currently using tape the option of using these new command line arguments. Otherwise people will be stuck renaming their tape module to whatever fork module name I come up with.

@jtlapp
Copy link
Contributor Author

jtlapp commented Jul 31, 2016

I'm going to look at doing something along these lines:

This would be ideal. The executable tape command cannot be extended, but it should be possible to create a new command that builds directly on tape by extension.

@jtlapp
Copy link
Contributor Author

jtlapp commented Aug 1, 2016

I found a way to implement -n via a single new hook, which I'm tentatively calling onPreTest(). It takes a test name and a boolean indicating whether tape has scheduled the test to run, and it returns either a test name or null. It returns null to prevent a scheduled test from running. The test name it returns replaces the prescribed test name. The hook can be used to arbitrarily filter tests based on command line arguments and test name. I have not looked into whether it could be used to provide per-test test setup.

Here is a approximately what onPreTest() would look like in my forthcoming alternative to the tape executable. It is here patched into the tape module for simplicity:

var nextTestNumber = 0; // TBD: temporary
Results.prototype.onPreTest = function (title, unscheduled) {
    if (onlyTestNumber !== false && nextTestNumber === 0) { // first run only
        if (onlyTestNumber === 0 || onlyTestNumber > this.tests.length) {
            exitWithError("test "+ onlyTestNumber +" not found");
        }
    }
    ++ nextTestNumber;
    if (numbered) {
        title = '['+ nextTestNumber +'] '+ title;
        if (onlyTestNumber) {
            if (nextTestNumber !== onlyTestNumber)
                title = null;
            else if (unscheduled)
                exitWithError("test "+ onlyTestNumber +" not scheduled to run");
        }
    }
    return title;
};

There were two main consequences to getting this hook to work:

  1. The tape-established only and skip scheduling pre-filters the available tests. onPreTest() will receive notice of each test, but it can't have tape run a test that it was not previously intending to run. In other words.
  2. I had to centralize control over which tests ran in a single function. This was previously divided up between this block and this block.

You can see the new approach in my dash-n-hook branch. The most significant changes are to those two blocks. In particular, I inserted an arbitrator for cancelling and/or renaming tests.

@jtlapp
Copy link
Contributor Author

jtlapp commented Aug 2, 2016

Woo! I was able to shorten the scheduling arbitrator by quite a bit by allowing skipped tests to run if they were skipped by argument flag. The tape test suite requires this strange feature, which complicated my original version of this function. The feature has a # SKIP comment show when a flag is used and no comment show otherwise. The effect is that when the SKIP comment shows, the test gets filtered by onPreTest(), and when not, onPreTest() never learns of the skipped test.

@jtlapp
Copy link
Contributor Author

jtlapp commented Aug 31, 2016

Hey @ljharb! I thought you might like to know where all this effort lead. I have written a faucet-like runner for tap, but far more sophisticated. It's called subtap.

Thank you for all your input during the early conception!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants