Add support for cleanup hook. #8

andrey-malets · 2018-01-27T16:30:55Z

The hook may be useful if the embedder wishes to clean up some global state
(which may leak out of flaky tests) to prevent interferention with other tests
that may depend on this state.

The hook is run once after tearing down all the processes in parallel pool and
after each test in async pool to allow cleanup after each test is run (may be
essential to fully isolate tests in serial retry runs).

The hook may be useful if the embedder wishes to clean up some global state (which may leak out of flaky tests) to prevent interferention with other tests that may depend on this state. The hook is run once after tearing down all the processes in parallel pool and after each test in async pool to allow cleanup after each test is run (may be essential to fully isolate tests in serial retry runs).

dpranke · 2018-01-29T23:54:22Z

Thanks for the patch! I'm trying to understand it and have some questions :)

It seems like this patch is trying to do two different things.

First, there's a "clean up at the end of everything" use case, where cleanup_fn is called at the end of each test run (i.e., once after the parallel round, and once after each serial retry round). Normally I'd expect the caller to just clean up whatever they needed to after the test run (i.e., after run() returned); however, that does not give you a chance to do any additional cleanup in between the initial parallel round and any subsequent serial rounds). Is that the scenario you're trying to address?

Second, there's the "clean up in between invocations" use case, but that only takes effect in the serial mode (--jobs=1, i.e., using AsyncPool). If you are worried about cleaning up after each test run, shouldn't that be used in each worker as well? And is there some reason you feel like you need the additional cleanup beyond the work done by the normal teardown/teardownClass/teardownProcess ?

It seems like it might be a bad idea to use the same mechanism for both cases, but I'm not sure about this yet, since I'm not sure I fully understand the intent.

In the normal case (after tearing down all parallel processes), couldn't the caller just as easily invoke the cleanup function themselves after calling join()?

andrey-malets · 2018-01-30T12:52:36Z

Is that the scenario you're trying to address?

Partly, I'll try to explain: some tests may brake and leak some common resource, which is required to be in certain state by the other ones. The real world example I'm facing in Telemetry tests in Chromium is the following:

There are a lot of tests which execute the browser as a part of their run. Normally, the test will start the browser, do whatever it needs to do, and finally close the browser, regardless of the outcome (close normally in successful case and kill in the case of a failure). Let's call those tests simple.
There are some tests that require the browser process to not exist at the start of their run. One example is the smoke test for cold browser startup benchmark: it executes the utility to clear system cache from browser artifacts before actually starting it. If the utility cannot reliably clear the cache (for example, Windows will not allow eviction of the file which is opened in another process), it complains about it, and everything fails. Therefore, such tests will almost always fail while executing along with another ones. Let's call such a test compilcated.

Since Telemetry tests have serial retries by default, such failures are not a big deal: 99% of simple tests do not impose any exotic requirements and happily execute in the parallel run, while 1% of complicated ones may fail and finally succeed in the serial retry round.

Now suppose that something breaks and one of simple tests fails and leaks some browser process in 1% of the runs. Now there's a big chance that this will lead to failures for some innocent complicated tests: after the resource leaks, it will be in such state until some outer level process cleans up (this may be the final cleanup after the test run on the buildbot, for example). But that will obviously happen too late, after all the serial retries, which in this case won't do any help.

There appear to be a couple of solutions:

free all the resources after each test run. Sounds great, but practically may not be 100% achievable due to complex nature of the tests. Besides that, the cleanup may not be applicable in the parallel run: suppose that "the ultimate cleanup solution" to the process leak described above is to kill all the processes holding files in browser directory after every test run. This will obviously not work with a bunch of tests executing in parallel.
place the cleanup code in teardown_fn hook. This also is not a 100% working solution since
a) I'm not sure there is any guarantee that teardown_fn in every test process will be called exactly after all the parallel processes finish running tests (this is required to implement "the ultimate solution")
b) the serial case is not covered at all: the teardown_fn is executed only once after each serial run, and there is no chance to do any cleanup after each test run.

So the proposed solution tries to fit this restrictions by modelling a set of independent test runs each of which ends with a cleanup:

the parallel stage is one big single "step", and cleanup is executed after all the parallel tests finish
each test in every serial run is considered to be a single "step", so cleanup runs after each serial test invocation.

I surely expect there may be other solutions to the problem described above, and will be very glad if you can offer better alternative.

dpranke · 2018-01-30T22:55:49Z

Thanks, that more-or-less confirms what I thought you were trying to do.

I think it makes sense to have a cleanup function that is invoked between running each set of tests, at least (i.e., between the parallel round and the first serial round, and in between each serial round). I'm less convinced that it makes sense to run an additional cleanup in between each serially executed test. The setup/teardown code for each test should be trying to ensure that the environment is in the state it needs to be in.

There is a slight difference you're describing, as well, in the idea that some tests can be executed in parallel, and some can't. The TYP APIs allow you to indicate which sort of test is which, so if you know ahead of time that some tests have this restriction, you can tell typ that and it'll do the right thing.

andrey-malets · 2019-01-17T09:30:49Z

(a little less than one year passed, and all of a sudden I decided to show up again and reply :))

The TYP APIs allow you to indicate which sort of test is which, so if you know ahead of time that some tests have this restriction, you can tell typ that and it'll do the right thing.

I have been thinking about it for a while and, eventually, came up with solution which does not require modifications in typ:

Mark all the "special" tests as serial-only.
Instead of doing cleanup after each test, run it before "special" tests to ensure we have no dangling busy resources left from previous runs.

In this case we do not need any special cleanup, however, downside is that if we forget to mark some "special" test as serial-only, that test will try to run in parallel with others, killing innocent processes and almost certainly causing other tests to fail with mysterious reasons.

Do you find this reasonable and doing a better job than cleaning up after each run? Or maybe there's another durable solution to this problem?

dpranke · 2019-02-02T02:10:21Z

Sorry for the delay in replying ...

I think what you suggest could work. I think it is perhaps a bit more clunky than I would like. Looking at the code now -- which is a long while after I wrote it -- I'm realizing that the semantics around setup_fn and teardown_fn aren't very clearly documented. I think I meant for them to be things that were guaranteed to be called once at the start of every subprocess and once at the end of every subprocess, but there's not anything that really says when subprocesses are created, and so that isn't as useful as you might hope.

I think adding a callback function that could be invoked in between the parallel set and the serial set still feels like a good idea. An alternative would be to break apart Runner.run() and expose the different phases of it so that the caller could effectively interpose stuff in between the calls of ._run_one_set(), but that exposes a lot of complexity just to be able to interject the one hook.

The latter could also let the caller do mostly arbitrary things between each test, like what you were originally asking for, but that doesn't actually seem like a great idea to me.

However, if you can make it work for now without needing modifications to TYP, I encourage you to do that and, once you have some experience with it, then maybe that can help us decide what we'd want to add and where.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cleanup hook. #8

Add support for cleanup hook. #8

andrey-malets commented Jan 27, 2018

dpranke commented Jan 29, 2018

andrey-malets commented Jan 30, 2018

dpranke commented Jan 30, 2018

andrey-malets commented Jan 17, 2019

dpranke commented Feb 2, 2019

Add support for cleanup hook. #8

Are you sure you want to change the base?

Add support for cleanup hook. #8

Conversation

andrey-malets commented Jan 27, 2018

dpranke commented Jan 29, 2018

andrey-malets commented Jan 30, 2018

dpranke commented Jan 30, 2018

andrey-malets commented Jan 17, 2019

dpranke commented Feb 2, 2019