Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: disable reconciliation before scaling in SaaS #476

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

deepthidevaki
Copy link
Contributor

@deepthidevaki deepthidevaki commented Jan 10, 2024

While testing E2E test with scaling, noticed that the current zbchaos cluster scale command do not work in SaaS because reconciliation is not disabled.

@ChrisKujawa
Copy link
Member

Hm can we add a test for this?

@deepthidevaki
Copy link
Contributor Author

@Zelldon
Any idea how to properly mock both zeebe crd and statefulset in tests? I get several errors

  1. In pause reconciliation - Object 'Kind' is missing in '{"metadata":{"labels":{"cloud.camunda.io/pauseReconciliation":"true"},"name":"testNamespace"}}'
  2. When scaling - E0111 11:41:29.861024 51681 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x18fb140), concrete:(*runtime._type)(0x1ae5360), asserted:(*runtime._type)(0x1ae7400), missingMethod:""} (interface conversion: runtime.Object is *v1.StatefulSet, not *v1.Scale) goroutine 10 [running]:

@ChrisKujawa
Copy link
Member

@deepthidevaki
Copy link
Contributor Author

Yes. I'm already using the available helper methods.

	// given
	k8Client := CreateFakeClient()
	k8Client.createSaaSCRD(t)
	k8Client.CreateStatefulSetWithLabelsAndName(t, &metav1.LabelSelector{}, "zeebe")

	// when
	_, err := k8Client.ScaleZeebeCluster(3)

@ChrisKujawa
Copy link
Member

Do you have your changes somewhere that I can take a look?

@deepthidevaki
Copy link
Contributor Author

There are no new changes. You can add the following test in statefulset_test.go for reproducing this.

func Test_ShouldPauseReconciliationBeforeScaling(t *testing.T) {
	// given
	k8Client := CreateFakeClient()
	k8Client.createSaaSCRD(t)
	k8Client.CreateStatefulSetWithLabelsAndName(t, &metav1.LabelSelector{}, "zeebe")

	// when
	_, err := k8Client.ScaleZeebeCluster(3)
	require.NoError(t, err)
	statefulset, err := k8Client.GetZeebeStatefulSet()

	// then
	require.NoError(t, err)
	assert.NotNil(t, statefulset)
}

@deepthidevaki
Copy link
Contributor Author

I think so. It has worked fine during our manual chaos tests. And it is also running fine with our daily automated tests (because reconciliation is already paused by a previous experiment).

@ChrisKujawa
Copy link
Member

Ok then I guess the only problem is that we have to setup the fake statefulset correctly

@ChrisKujawa
Copy link
Member

@deepthidevaki I played a bit with it and did my changes here https://github.com/zeebe-io/zeebe-chaos/tree/ck-scaling-test

Basically what we have to do is to use the update API if we want to test it properly. GetScale is not supported by the FakeClient.

Furthermore, we have to update the statefulset which we create to set a correct replica value.

The awaiting of the available replicas I moved out, since this makes it harder to test, the status is not updated with the fake clients. Likely we need to call this separate in the other places.

I also saw in the backup commands that we call the disabling of reconcile separately from the scale (so before), we could also do this in our scale command.

@@ -55,6 +56,14 @@ func (c K8Client) ScaleZeebeCluster(replicas int) (int, error) {
if err != nil {
return 0, err
}

if c.SaaSEnv {
pauseError := c.PauseReconciliation()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we do this always for backups in the command already, I would be fine if we do this in the scale command as well and not here.

https://github.com/zeebe-io/zeebe-chaos/blob/main/go-chaos/cmd/backup.go#L95-L98

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. But then I wouldn't test anything specific for this PR, as testing the whole command will be cumbersome.

@deepthidevaki
Copy link
Contributor Author

Thanks @Zelldon for looking into it further. But I think, I will leave the testing out because not pausing is done in the command. So writing a unit test is more difficult.

Copy link
Member

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deepthidevaki 🚀 👍🏼

@deepthidevaki deepthidevaki merged commit 54d137c into main Jan 12, 2024
3 checks passed
@deepthidevaki deepthidevaki deleted the dd-fix-scaling-saas branch January 12, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants