`sagas::instance_create::test::test_action_failure_can_unwind` doesn't test failure after all saga nodes #3265

gjcolombo · 2023-05-31T17:02:43Z

Code:

omicron/nexus/src/app/sagas/instance_create.rs

Lines 1647 to 1691 in 241c673

    
           #[nexus_test(server = crate::Server)] 
        
           async fn test_action_failure_can_unwind( 
        
               cptestctx: &ControlPlaneTestContext, 
        
           ) { 
        
               DiskTest::new(cptestctx).await; 
        
               let log = &cptestctx.logctx.log; 
        
               let client = &cptestctx.external_client; 
        
               let nexus = &cptestctx.server.apictx().nexus; 
        
               let project_id = create_org_project_and_disk(&client).await; 
        
               // Build the saga DAG with the provided test parameters 
        
               let opctx = test_opctx(&cptestctx); 
        
               let params = new_test_params(&opctx, project_id); 
        
               let dag = create_saga_dag::<SagaInstanceCreate>(params).unwrap(); 
        
               for node in dag.get_nodes() { 
        
                   // Create a new saga for this node. 
        
                   info!( 
        
                       log, 
        
                       "Creating new saga which will fail at index {:?}", node.index(); 
        
                       "node_name" => node.name().as_ref(), 
        
                       "label" => node.label(), 
        
                   ); 
        
                   let runnable_saga = 
        
                       nexus.create_runnable_saga(dag.clone()).await.unwrap(); 
        
                   // Inject an error instead of running the node. 
        
                   // 
        
                   // This should cause the saga to unwind. 
        
                   nexus 
        
                       .sec() 
        
                       .saga_inject_error(runnable_saga.id(), node.index()) 
        
                       .await 
        
                       .unwrap(); 
        
                   nexus 
        
                       .run_saga(runnable_saga) 
        
                       .await 
        
                       .expect_err("Saga should have failed"); 
        
                   verify_clean_slate(&cptestctx).await; 
        
               } 
        
           }

While trying to figure out whether this test could have caught #3260 I noticed that, while it passes, it never executes any node in the instance create saga past N005 (the "create instance record" step). That is, the injected failures in any nodes past the "create instance record" node are never reached, because that node always fails first. This doesn't cause the test to fail because it only checks that the saga failed and not that it failed at the point where the failure was actually injected.

This is happening because the instance create saga creates a new instance ID when the saga DAG is created (and not as part of saga execution itself):

omicron/nexus/src/app/sagas/instance_create.rs

Lines 144 to 155 in 241c673

    
           fn make_saga_dag( 
        
               params: &Self::Params, 
        
               mut builder: steno::DagBuilder, 
        
           ) -> Result<steno::Dag, SagaInitError> { 
        
               let instance_id = Uuid::new_v4(); 
        
               builder.append(Node::constant( 
        
                   "instance_id", 
        
                   serde_json::to_value(&instance_id).map_err(|e| { 
        
                       SagaInitError::SerializeError(String::from("instance_id"), e) 
        
                   })?, 
        
               ));

The same DAG gets reused for every loop through the test, so the instance ID doesn't change between iterations. After the "create record" node fails for the first time, the instance is in the Destroyed state, and all subsequent attempts to recreate it fail. (Outside the test, attempting to recreate an instance with the same name as a failed instance will create a new saga with a new instance ID, avoiding the conflict.)

gjcolombo linked a pull request Jun 6, 2023 that will close this issue

instance create saga: create new DAG for each test iteration #3309

Merged

gjcolombo closed this as completed in #3309 Jun 7, 2023

gjcolombo mentioned this issue Aug 17, 2023

want generic framework for common saga tests #3896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sagas::instance_create::test::test_action_failure_can_unwind` doesn't test failure after all saga nodes #3265

`sagas::instance_create::test::test_action_failure_can_unwind` doesn't test failure after all saga nodes #3265

gjcolombo commented May 31, 2023

sagas::instance_create::test::test_action_failure_can_unwind doesn't test failure after all saga nodes #3265

sagas::instance_create::test::test_action_failure_can_unwind doesn't test failure after all saga nodes #3265

Comments

gjcolombo commented May 31, 2023

`sagas::instance_create::test::test_action_failure_can_unwind` doesn't test failure after all saga nodes #3265

`sagas::instance_create::test::test_action_failure_can_unwind` doesn't test failure after all saga nodes #3265