testscript: panic: fail now! in stop command on Windows #228

bep · 2023-05-12T19:45:44Z

I'm creating this issue in the hope that someone could give me some helping hints before I give in and start up my Windows VM.

The short take is:

I have rewritten Hugo's CLI and added testscripts for all of the commands. That process has been a real pleasure.
All of them is smooth sailing on *nix.
The server tests (using the background process support), fails on Windows with a not so informational error (see below).
One example of such failing test is: https://github.com/bep/hugo/blob/ee8985cc3fdfab60c4c24e028cb33a33a3ce6ee5/testscripts/commands/server_render_to_memory.txt#L4

--- FAIL: TestCommands (2.00s)
    --- FAIL: TestCommands/server_render_to_memory (1.11s)
        testscript.go:534: # Test the hugo server command.
            # We run these tests in parallel so let Hugo decide which port to use.
 (1.102s)
            > hugo server &
            > waitServer
            > httpget $HUGOTEST_BASEURL_0 'Title: Hugo Server Test' $HUGOTEST_BASEURL_0
            > ! exists public/index.html
            > ! exists public/mystatic.txt
            > stop
            stop
            [background] hugo.exe server: exit status 1
            [stdout]
            Start building sites … 
            hugo v0.112.0-DEV windows/386 BuildDate=unknown
            
                               | EN  
            -------------------+-----
              Pages            |  2  
              Paginator pages  |  0  
              Non-page files   |  0  
              Static files     |  1  
              Processed images |  0  
              Aliases          |  0  
              Sitemaps         |  0  
              Cleaned          |  0  
            
            Built in 91 ms
            Watching for changes in $WORK\{layouts,static}
            Watching for config changes in $WORK\hugo.toml
            Running in Fast Render Mode. For full rebuilds on change: hugo server --disableFastRender
            Web Server is available at http://localhost:1313/ (bind address 127.0.0.1)
            Press Ctrl+C to stop
            FAIL: testscripts\commands\server_render_to_memory.txt:13: unexpected command failure
            
panic: fail now! [recovered]
	panic: fail now!

goroutine 45 [running]:
testing.tRunner.func1.2({0x1b175c0, 0x14847970})
	C:/hostedtoolcache/windows/go/1.19.9/x64/src/testing/testing.go:1[39](https://github.com/gohugoio/hugo/actions/runs/4961873656/jobs/8879210277?pr=10602#step:17:40)6 +0x2ab
testing.tRunner.func1()
	C:/hostedtoolcache/windows/go/1.19.9/x64/src/testing/testing.go:1399 +0x[41](https://github.com/gohugoio/hugo/actions/runs/4961873656/jobs/8879210277?pr=10602#step:17:42)f
panic({0x1b175c0, 0x14847970})
	C:/hostedtoolcache/windows/go/1.19.9/x64/src/runtime/panic.go:884 +0x1ba
github.com/rogpeppe/go-internal/testscript.(*TestScript).Fatalf(0x149e6900, {0x1d91689, 0x1a}, {0x0, 0x0, 0x0})
	C:/Users/runneradmin/go/pkg/mod/github.com/rogpeppe/[email protected][43](https://github.com/gohugoio/hugo/actions/runs/4961873656/jobs/8879210277?pr=10602#step:17:44)74f/testscript/testscript.go:1104 +0x1[49](https://github.com/gohugoio/hugo/actions/runs/4961873656/jobs/8879210277?pr=10602#step:17:50)
github.com/rogpeppe/go-internal/testscript.(*TestScript).waitBackground(0x149e[69](https://github.com/gohugoio/hugo/actions/runs/4961873656/jobs/8879210277?pr=10602#step:17:70)00, 0x1)

The text was updated successfully, but these errors were encountered:

mvdan · 2023-05-12T21:34:23Z

Do they always fail like that on Windows, or just sometimes?

Do you have a minimal reproducer? Beyond making the investigation easier, it would also serve as a regression test for the eventual fix.

mvdan · 2023-05-12T21:35:46Z

Also, could you share an entire goroutine trace? I think the problem is that we don't set up the recover for failNow in one of the goroutines.

bep · 2023-05-13T08:52:32Z

Do they always fail like that on Windows, or just sometimes?

I don't have a lot of data on this (I have ran it on GitHub 3-4 times, it always fails on Windows; I have since disabled these tests on Windows, and they always passes on Linux/MacOs, which is good), but I will close my eyes and boot up my Windows VM later today and see what happens.

mvdan · 2023-05-13T09:02:54Z

FYI I think the problem is that waitBackground uses ts.Fatalf, which is subtly different from ts.t.Fatalf. The former is our own implementation, which calls panic(failNow) to mimic the "stop now" behavior from testing.T. Within a testscript command, ts.Fatalf should be used to properly use testscript's "fail" logic. In the other setup/teardown code, ts.t.Fatalf should be used, because in many of those cases, the deferred recover to handle failNow may not exist.

waitBackground is used in both scenarios - both by the wait command, and by the teardown code when the entire script finishes. In the latter case I think that's where you run into issues, because it uses ts.Fatalf.

Closes rogpeppe#228

myitcv · 2023-05-13T10:30:44Z

Drive-by comment unrelated to the error you are seeing, @bep: I've not had a particularly enjoyable time trying to test server processes using a command to start the server, and then another one to close it. Instead, I've tended to use special files in the testscript itself to indicate that a server should be started for the purposes of a given test, and then tear down happens at the end. This also means that the setup command can more easily wait until the server is known to be ready before starting the script.

bep · 2023-05-13T11:49:24Z

can more easily wait until the server is known to be ready before starting the script.

Yes, I'm actually doing that. My current problem is the stopping of the server, which seem to somehow end with ProcessState.Success() == false on Windows -- even if the test itself passes. I could possibly add a custom "non signal" way to stop the server during tests, as well, but I will try to dig a little further first.

mvdan · 2023-05-13T12:03:01Z

To clarify, I think Paul means starting and stopping your server program from Go code, via testscript's Setup function. Then you have full control of it: block until it's ready, capture its output, what signal to give it to stop, etc. testscript's own background process support is okay, just slightly limited depending on what you want to do.

bep · 2023-05-13T13:39:04Z

This test repo always fails on Windows:

https://github.com/bep/repro-testscript-228

Log here:

https://github.com/bep/repro-testscript-228/actions/runs/4967165173/jobs/8889003644#step:4:75

Not sure why this is falsy on Windows:

go-internal/testscript/cmd.go

Line 540 in a4f6fab

if bg.cmd.ProcessState.Success() {

bep · 2023-05-13T16:18:33Z

OK, I guess this is a what to be expected on Windows when killing the process (as in, no SIGINT). I will find a workaround, but I'll keep this issue open if someone has a fix.

myitcv · 2023-05-13T17:11:37Z

I think Paul means starting and stopping your server program from Go code, via testscript's Setup function

Exactly so.

myitcv · 2023-05-13T17:15:28Z

OK, I guess this is a what to be expected on Windows when killing the process (as in, no SIGINT). I will find a workaround, but I'll keep this issue open if someone has a fix.

@bep I'd strongly consider doing the server setup and teardown in the Setup() function with corresponding use of Defer(). It will not only make things more robust, but it will also tidy up a number of your scripts to remove the stutter of:

hugo server &
waitServer

...

stop

You can "mark" scripts that expect a server to be running using a special file in the script archive:


-- .run_a_server --

Check for the existence of such a file in the Setup() function and run a server accordingly.

myitcv · 2023-05-13T17:16:54Z

For a fairly involved example I've recently written https://github.com/cue-lang/cuelang.org/blob/a0df30d1ef4858b51bdfe96b7f6c507a0f08fb81/internal/fsnotify/fsnotify_test.go#L57 (which coincidentally might be a package of interest to Hugo).

bep · 2023-05-13T17:24:55Z

It will not only make things more robust, but it will also tidy up a number of your scripts to remove the stutter of:

@myitcv Maybe. The thing is:

Having test scripts that's self contained and as close to the "real thing" as possible is valuable to me, and I don't mind a little stutter if I can get that.
My server tests are mostly to verify that the CLI is wired up correctly, so adding variations with hugo server --renderToMemory & is incredibly powerful and self documenting.

myitcv · 2023-05-13T19:23:45Z

Having test scripts that's self contained and as close to the "real thing" as possible is valuable to me, and I don't mind a little stutter if I can get that.

My server tests are mostly to verify that the CLI is wired up correctly, so adding variations with hugo server --renderToMemory & is incredibly powerful and self documenting.

Just to highlight that you don't necessarily lose any capabilities with the special file approach. You could just as easily put the arguments to the server in the special file, extract them using code like this, then pass the resulting []string to your command with almost zero special handling required.

incredibly powerful and self documenting

I'm totally with you on this, just sharing my experience of having tested a number of commands where a server (or equivalent mode) became too painful.

As you will notice from the fsnotify tests I linked above, it also makes it easier to orchestrate commands that can interact more precisely with the running server. @ldemailly that partially addresses the question you asked in the linked issue.

bep · 2023-05-13T19:52:00Z

@myitcv I'm a little confused. I see the Setup and the Defer -- but how do you pass arguments from the test script to those? Having them in some other "special file" is not what i talk about when I use the term "self contained test scripts".

myitcv · 2023-05-14T04:41:44Z

I'm a little confused. I see the Setup and the Defer -- but how do you pass arguments from the test script to those?

Just to clarify, I'm be talking about the example from fsnotify. The testscript scripts live within the testdata directory:

https://github.com/cue-lang/cuelang.org/tree/a0df30d1ef4858b51bdfe96b7f6c507a0f08fb81/internal/fsnotify/testdata

The Setup() function delegates to:

https://github.com/cue-lang/cuelang.org/blob/a0df30d1ef4858b51bdfe96b7f6c507a0f08fb81/internal/fsnotify/fsnotify_test.go#L57

When each script runs, before Setup() is called, the script archive is extracted to a temporary directory. The Setup() function can therefore do things based on the files within the archive, which now (at the time Setup() is called) exist on disk.

In the fsnotify example, as you can see in the Setup() function, it checks for the existence of a file named .batched in the root of the archive:

https://github.com/cue-lang/cuelang.org/blob/a0df30d1ef4858b51bdfe96b7f6c507a0f08fb81/internal/fsnotify/fsnotify_test.go#L99-L113

Note the use of (*testscript.Env).Cd for the path to the directory into which the archive has been extracted.

If the file exists, its contents are used to parse a time.Duration which is then used in the creation of a batched watcher. If the file does not exist, then we create a regular watcher.

Here is an example of one of the scripts in the testdata directory that contains one of the .batched special files:

https://github.com/cue-lang/cuelang.org/blob/a0df30d1ef4858b51bdfe96b7f6c507a0f08fb81/internal/fsnotify/testdata/batched_directory_moved_into_watch_tree.txtar#L31-L32

The file .batched is what I would refer to as a special file within a test archive. Its existence or otherwise, can be used to help signal things to the Setup() function. In this case it's not just the existence, but the contents of the file that are used.

Hence in my suggestion re the running of a server, you could use the existence of a special file within a testscript script archive to signal that Setup() should run Hugo in serve mode. Furthermore, you could use the contents of that file to drive the args to serve, in the way that I suggested in #228 (comment)

myitcv · 2023-05-14T06:52:19Z

Updated my latest comment with a link to one of the .batched special files to make the point slightly clearer.

bep · 2023-05-14T08:50:09Z

@myitcv thanks for the detailed explanation. OK, I understand how to set it up. The thing is, though, when I compare the 2 scripts below:

hugo server --renderToMemory &
sleep
! stderr .
stop
# I can even start more servers here if I want to.

And:

sleep
! stderr .
-- magicserverfile --
--renderToMemory

I would much, much prefer the first variant any day of the week. And one great thing about it is that the average developer coming in to fix 1 thing in a project he/she has no experience with, can read that script and understand exactly what it does (also, it does not require 1K lines of custom test setup behind the scenes).

This particular issue (#228) is obviously a bug. It may be hard (impossible?) to fix for the testscript.RunMain commands, but perhaps the Cmds commands could have support for the & "operator" and a stop would run the Defer? Or something.

My current workaround is a custom stopServer command that just sends a HEAD request to a "stop server" endpoint.

myitcv · 2023-05-14T10:59:56Z

At the risk of derailing this bug report still further, just to point out a couple of things:

sleep
! stderr .

I've found sleep to in general be a very brittle way of waiting for a server to become available. You either need to be very generous on the sleep time which slows down all server tests, or risk it being too small and CI failing intermittently. Hence, when I start such a server in Setup(), I have the setup phase wait until the server is definitely ready. Then you can elide this step entirely from the script (which again is more robust).

The checking of stderr will not work for a backgrounded task FWIW. Again, you need a custom command to interact with the server process. That's why in the fsnotify tests I define a log command that directly interacts with the "server" (in that case):

https://github.com/cue-lang/cuelang.org/blob/a0df30d1ef4858b51bdfe96b7f6c507a0f08fb81/internal/fsnotify/fsnotify_test.go#L52

I take your point about it being unusual to not have a command at the start of a script, but it's a "problem" that is (again in my experience) relatively easily overcome with a script naming format like serve_*.txtar and comments that generally appear at the top of each such script explaining what is going on, what is being tested etc.

Again, none of what I am saying is a rule in any way, just sharing my experience from having written a number of such test setups.

bep · 2023-05-14T11:41:43Z

I've found sleep to in general be a very brittle way of waiting for a server to become available

Which is why in my real tests have both a waitServer (which uses polling) and a stopServer (which uses a HTTP head request + a small timeout), which has proven to be rock stable (even on Windows).

The checking of stderr will not work for a backgrounded task FWIW.

Not that my tests rely on this, but what you say isn't obvious when reading the docs:

If the last token is '&', the program executes in the background. The standard output and standard error of the previous command is cleared, but the output of the background process is buffered — and checking of its exit status is delayed — until the next call to 'wait', 'skip', or 'stop' or the end of the test.
Apply the grep command (see above) to the standard error from the most recent exec or wait command.

I suggest, however, that we instead focus on the issue. The testscript package is incredibly useful, it would be even better if everyone didn't have to create elaborate workarounds to test servers, which is a common enough requirement.

mvdan · 2023-05-14T11:43:43Z

I agree that we should fix the panics in any case. Whether or not testscript's support for background commands should be redesigned (or avoided?) is a different matter.

bep · 2023-05-14T14:13:32Z

Just to summarise my findings (so people don't have to wade through all of the above to get the gist of it:

I suggest leaving the panic as is -- the stack trace is more useful than a unexpected command failure message.
I don't think it's possible to get https://github.com/bep/repro-testscript-228 green on Windows by using any variation of Process.Signal(interrupt). There's a ton of research on this on Go's issue tracker, but no great conclusion.

When an entire script runs and the end is reached, any background processes begun with a '&' command get interrupted or killed, depending on the platform and timeout, and we wait for them to finish. We also checked their resulting status code and failed if they didn't exit with a status code of 0. However, as explained in the comment, this would always fail on Windows, given that it doesn't have interrupt signals so we would kill directly, causing a "signal: killed" error. Worse, any failures here caused a `panic: fail now!` as that is how we bubble up errors when a script command is being run, but such panics were not being recovered once we reached the end of a script. Now that we don't check the result anymore here, the panics are gone. Fixes rogpeppe#228. Fixes rogpeppe#260.

When an entire script runs and the end is reached, any background processes begun with a '&' command get interrupted or killed, depending on the platform and timeout, and we wait for them to finish. We also checked their resulting status code and failed if they didn't exit with a status code of 0. However, as explained in the comment, this would always fail on Windows, given that it doesn't have interrupt signals so we would kill directly, causing a "signal: killed" error. Worse, any failures here caused a `panic: fail now!` as that is how we bubble up errors when a script command is being run, but such panics were not being recovered once we reached the end of a script. Now that we don't check the result anymore here, the panics are gone. Fixes #228. Fixes #260.

bep mentioned this issue May 13, 2023

testscript: handle the "fail now!" panic in cmdWait #229

Closed

bep added a commit to bep/go-internal that referenced this issue May 13, 2023

testscript: handle the "fail now!" panic in cmdWait

98bb3d5

Closes rogpeppe#228

bep changed the title ~~testscript: panic: fail now! on background processes on Windows~~ testscript: panic: fail now! in stop command on Windows May 13, 2023

ldemailly mentioned this issue May 13, 2023

testscript: signal or some other way to interrupt running command #195

Closed

rogpeppe mentioned this issue Jun 13, 2024

testscript: interrupted background process should not cause a test to fail #260

Closed

mvdan mentioned this issue Jul 9, 2024

testscript: ignore result when interrupting background processes #265

Merged

mvdan closed this as completed in #265 Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testscript: panic: fail now! in stop command on Windows #228

testscript: panic: fail now! in stop command on Windows #228

bep commented May 12, 2023 •

edited

Loading

mvdan commented May 12, 2023

mvdan commented May 12, 2023

bep commented May 13, 2023

mvdan commented May 13, 2023

myitcv commented May 13, 2023

bep commented May 13, 2023 •

edited

Loading

mvdan commented May 13, 2023

bep commented May 13, 2023

bep commented May 13, 2023

myitcv commented May 13, 2023

myitcv commented May 13, 2023

myitcv commented May 13, 2023

bep commented May 13, 2023

myitcv commented May 13, 2023

bep commented May 13, 2023

myitcv commented May 14, 2023 •

edited

Loading

myitcv commented May 14, 2023

bep commented May 14, 2023 •

edited

Loading

myitcv commented May 14, 2023

bep commented May 14, 2023

mvdan commented May 14, 2023

bep commented May 14, 2023

testscript: panic: fail now! in stop command on Windows #228

testscript: panic: fail now! in stop command on Windows #228

Comments

bep commented May 12, 2023 • edited Loading

mvdan commented May 12, 2023

mvdan commented May 12, 2023

bep commented May 13, 2023

mvdan commented May 13, 2023

myitcv commented May 13, 2023

bep commented May 13, 2023 • edited Loading

mvdan commented May 13, 2023

bep commented May 13, 2023

bep commented May 13, 2023

myitcv commented May 13, 2023

myitcv commented May 13, 2023

myitcv commented May 13, 2023

bep commented May 13, 2023

myitcv commented May 13, 2023

bep commented May 13, 2023

myitcv commented May 14, 2023 • edited Loading

myitcv commented May 14, 2023

bep commented May 14, 2023 • edited Loading

myitcv commented May 14, 2023

bep commented May 14, 2023

mvdan commented May 14, 2023

bep commented May 14, 2023

bep commented May 12, 2023 •

edited

Loading

bep commented May 13, 2023 •

edited

Loading

myitcv commented May 14, 2023 •

edited

Loading

bep commented May 14, 2023 •

edited

Loading