Create common scaffolding for saga undo tests #3904

gjcolombo · 2023-08-18T20:49:35Z

Several of our saga undo tests have previously been found to have defects in
which the test verifies that all attempts to execute a saga fail but doesn't
verify that saga executions fail at the expected failure point. This loses test
coverage, since the tests don't actually execute all of the undo actions of the
saga under test.

To try to prevent this problem, add a set of saga test helpers that provide
common logic for writing undo tests. The scaffold functions handle the business
of constructing a DAG, deciding where to inject an error, and verifying that
errors occur in the right places. The callers provide a saga type and factory
functions that run before and after each saga execution to set up state, provide
saga parameters, check invariants, and clean up after each test iteration.

Convert existing tests of these types to use the new scaffolds. This revealed
that the no-pantry version of the disk snapshot test has a similar bug: the saga
requires the disk being snapshotted to be attached to an instance, but the test
wasn't creating an instance, so the saga never got past its "look up the
instance that owns the disk" step. Fix this issue.

Finally, coalesce some common instance operations (start, stop, delete,
simulate) that were used by multiple saga tests into the test helpers.

This is a test-only change, tested both by running cargo test and finding no
errors and by introducing bugs into some sagas and verifying that tests run with
the scaffolds catch those bugs.

Fixes #3896.

The no-pantry flavor of the snapshot unwind test requires the test disk to be attached to an instance. Make sure that happens. Also add a bunch of instance manipulation functions to the test helpers to avoid duplicating code between the migrate and snapshot sagas.

Tested by injecting a bug into the create saga's undo-is-idempotent test and verifying that the new harness caught it.

smklein

Thanks for doing this -- looks great, with a couple comments about generate_params being a little complex below

smklein · 2023-08-21T21:07:49Z

nexus/src/app/sagas/test_helpers.rs

+    initial_params: S::Params,
+    generate_params: G,


Why bother separating these arguments? Couldn't we just generate initial_params using generate_params?

(I can see that the snapshot test looks a bit more complex in this area, but most tests appear to simply call generate_params to create initial_params anyway).

This is a somewhat clumsy attempt to require at compile time that the DAGs under test are the same as the DAG that was used to figure out how many nodes to iterate over. I haven't yet found an elegant way to do this; they all have some problems:

Create a single DAG in the test and reuse it: this is incorrect if the parameters have to change each time, as in the instance create saga

Create parameters for the "query" DAG using the pre-iteration callback: this is incorrect if the callback has side effects, which is allowed (the snapshot test uses them)

Create a DAG in the caller and provide it to the harness: this lets the caller pass any SagaDag with the underlying saga type erased, though this can be mitigated somewhat by checking that the saga names are the same

Make the caller pass an S::Params that's independent of the ones generated by the pre-iteration callback: this avoids problem (2) but, as you've noted, looks weird; we're creating fake parameters just to throw the generated DAG away

I ended up going with a different kludge in e5030eb:

Take the DAG created by the first test iteration, count its nodes, and use those for loop bounds; in each iteration, assert that the expected node count didn't change

The downside with this method is that we have to do the loop indexing by hand and so are more vulnerable to off-by-one errors than we otherwise would be. But I think I've avoided them, and we get the nice benefits that (a) we know the DAG has the same length (if not the same structure) during each iteration, and (b) we can remove a bunch of useless parameter generation from the callers.

I like this kludge more! It at least internalizes these details, rather than forcing them on the caller. And IMO the new argument name makes a lot of sense.

smklein · 2023-08-21T21:15:28Z

nexus/src/app/sagas/snapshot_create.rs

+        >(
+            nexus,
+            params,
+            || {


This function is documented as:

/// - generate_params: A callback called at the beginning of each loop
/// iteration that returns a future that yields the saga parameters to use for
/// that loop iteration.

This seems to kinda be treading into the realm of after_saga, by actually modifying state.

IMO it's kinda surprising to have a "generate_params" function also act as a "test setup" function -- if we want this, should we update the name accordingly?

Renamed in e5030eb.

jmpesp

This is awesome! One suggestion: it's (embarrassingly) occurring to me now that verify_clean_slate is different for each of these tests. I added a no_stuck_sagas check in nexus/src/app/sagas/disk_create.rs, and my suggestion is to move that from there into these new helpers so that all tested sagas are checked to see if unwinding failed.

gjcolombo · 2023-08-24T00:00:12Z

I added a no_stuck_sagas check in nexus/src/app/sagas/disk_create.rs, and my suggestion is to move that from there into these new helpers so that all tested sagas are checked to see if unwinding failed.

Done in 92649a5. Thanks for the suggestion! I also looked at combining the verify_clean_slate functions more generally, but it wasn't obvious to me that that could be done very elegantly, since each module with such a routine has a different set of things that it's checking for.

I did take the liberty of combining all of the existing forward-idempotency tests in 04d3c9c; these are a lot simpler than the undo tests, but there's still no reason to have them copy-pasted into each saga module.

gjcolombo added 5 commits August 18, 2023 15:28

add wrapper for test_action_failure_can_unwind

12daa6a

clean up & standardize instance helper functions

8412385

Add and use helper for undo idempotency

8b12cc3

Tested by injecting a bug into the create saga's undo-is-idempotent test and verifying that the new harness caught it.

cleanup

d305eba

gjcolombo requested review from jmpesp and smklein August 18, 2023 20:49

smklein approved these changes Aug 21, 2023

View reviewed changes

gjcolombo added 2 commits August 21, 2023 23:51

avoid creating a template DAG to count nodes; improve parameter names

e5030eb

remove unneeded comment line

ca1b71d

jmpesp approved these changes Aug 23, 2023

View reviewed changes

gjcolombo added 2 commits August 23, 2023 23:16

consolidate repeated-node idempotency tests

04d3c9c

use 'no undo steps failed' assertion in all undo tests

92649a5

gjcolombo merged commit 3412cb1 into main Aug 24, 2023

gjcolombo deleted the gjcolombo/templated-saga-tests branch August 24, 2023 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create common scaffolding for saga undo tests #3904

Create common scaffolding for saga undo tests #3904

gjcolombo commented Aug 18, 2023

smklein left a comment

smklein Aug 21, 2023

gjcolombo Aug 21, 2023

smklein Aug 21, 2023

smklein Aug 21, 2023

gjcolombo Aug 21, 2023

jmpesp left a comment

gjcolombo commented Aug 24, 2023

Create common scaffolding for saga undo tests #3904

Create common scaffolding for saga undo tests #3904

Conversation

gjcolombo commented Aug 18, 2023

smklein left a comment

Choose a reason for hiding this comment

smklein Aug 21, 2023

Choose a reason for hiding this comment

gjcolombo Aug 21, 2023

Choose a reason for hiding this comment

smklein Aug 21, 2023

Choose a reason for hiding this comment

smklein Aug 21, 2023

Choose a reason for hiding this comment

gjcolombo Aug 21, 2023

Choose a reason for hiding this comment

jmpesp left a comment

Choose a reason for hiding this comment

gjcolombo commented Aug 24, 2023