[EPIC] A reusable fault-injector and resolver #139

ChrisKujawa · 2022-05-06T14:28:35Z

Motivation

Currently we have several shell scripts to execute chaos experiments with chaostoolkit. The scripts are currently working well, but the maintenance is rather hard, especially for people which might not familiar enough with bash.

This is the reason why we already migrated some of them to a kotlin based chaos worker. But we haven't done this for scripts which directly interact with the kubernetes API. The problem here is that we would need an executable cli to reference them also in the chaostoolkit experiments, to run it locally. Furthermore, the interaction with kubernetes in go, I would say, is easier/better.

Solution

We create a new go cli, with cobra. The cli allows to be executed by chaostoolkit locally. Furthermore, we use the zeebe go worker api such that we can register on the testbench. We use the go kubernetes client to interact with the kubernetes api, and use retry functionalities as we do in the shell scripts to make the experiments less flaky.

Benefit of this would be to familiarize a bit more with go and our provided go client.

Todo's left:

Feature parity
Clean up:

Inventory

In order to see what is left and missing here is a table of scripts/functionality and the related mapping in zbchaos

Script	Function	Zbchaos counterpart
`apply_net_admin.sh`	Applies the NET_ADMIN capability to the Zeebe brokers	Part of `zbchaos disconnect`
`await-message-correlation.sh`	Deploys a model with a msg catch event and awaits the completion of the instance	Can be done via `zbchaos verify steady-state --awaitResult --processModelPath`, since we can define the model and await the completion
`await-processes-with-result.sh`	Deploys a model, creates a PI and awaits the completion	`zbchaos verify steady-state --awaitResult`
`connect-leaders.sh`	Connect the brokers after disconnecting them	`zbchaos connect brokers`
`connect-standalone-gateway.sh`	Connect the standalone gateway again	`zbchaos connect gateway`
`deploy-different-versions.sh`	Deploy different versions of a certain model.	`zbchaos deploy process`
`deploy-model.sh`	Deploy a process model	Part of `zbchaos verify steady-state`
`corrupt*`	Corrupt a followers snapshot	Not part of zbchaos, since it is no longer in use.
`disconnect-leaders-one-way.sh`	Disconnect Leaders asymmetric	`zbchaos disconnect brokers --one-direction`
`disconnect-leaders.sh`	Disconnect Leaders bi-directionally	`zbchaos disconnect brokers`
`disconnect-standalone-gateway.sh`	Disconnect a standalone gateway from brokers	`zbchaos disconnect gateway --all`
`publish-message.sh`	Publishes a message to partition one	`zbchaos publish` This command also supports specifying different partitions and different message names.
`shutdown-gracefully-partition.sh`	Shutdowns a broker with given partition and role	`zbchaos restart` This command allows to specify a broker via nodeId or via partitionId and role.
`start-instance-on-partition-with-version.sh`	Starts an instance with a specific version on a specific partition.	`zbchaos verify steady-state --version`
`start-many-instances.sh`	Starts many instances in the zeebe cluster	Not supported right now, and not used in our current experiments
`stress-cpu.sh`	Stresses the CPU with extra workload on a specific node (gateway or broker)	`zbchaos stress gateway/broker --cpu`
`terminate-partition.sh`	Terminates a broker with given partition and role	`zbchaos terminate` This command allows to specify a broker via nodeId or via partitionId and role.
`terminate-workers.sh`	Terminates workers in the zeebe cluster	`zbchaos terminate worker`
`util*`	Contain util functions	Not necessary to be ported
`verify-readiness.sh`	Verifies the readiness, which means checks whether the gateway has a Available deployment and the Brokers has ready pods.	`zbchaos verify readiness`
`verify-steady-state.sh`	Verifies the steady state, which means deploying a process model, and creating instances until a required partition is reached.	`zbchaos verify steady-state`
`zbctl-start-instances.sh`	Used to created instances on pod	Not necessary to be ported, part of `start-many-instances.sh`

In order to understand which experiments are supported right now with zbchaos and which are missing I will list them in the following table. Be aware that I will only mention the Production-S experiments since these are the only experiments that we have automated.

Experiment	Supported by zbchaos	Details
deployment-distribution	YES	-
follower-restart	YES	-
follower-terminate	YES	-
leader-restart	YES	-
leader-terminate	YES	-
msg-correlation	YES	-
multiple-leader-restart	YES	-
stress-cpu-on-broker	YES	-
worker-restart	YES	-

What else is missing:

The current kotlin worker does also some other things we need to port before we can remove it completely.

Read all experiments and return them as variables. This is necessary for the chaos experiment automation, to know which experiment are executed and how they look like. This means which action needs to be executed etc.
Deploy workers as part of the chaos worker. Might make sense to add this as extra subcommand to zbchaos to deploy workers which can complete instances.
Adjust the experiments such they use the zbchaos commands, instead of referencing the scripts. This can be done incrementally Adjust all chaos experiments to use the zbchaos tool #237
Use json logging in chaos worker zbchaos

Done

Q2 2022 KR A reusable fault-injector and resolver is implemented and used in the Zeebe E2E and chaos tests

Implement a new go application which can be used, locally as cli and as worker library for testbench
- Define how to test go fault injector #140
- Split up https://github.com/zeebe-io/zeebe-chaos/pulls/122
- Add CI; (github actions)
- TODO...

The text was updated successfully, but these errors were encountered:

ChrisKujawa · 2022-08-08T06:48:44Z

ChrisKujawa · 2022-08-10T12:37:19Z

Regarding testing internal backend which talks with kubernetes API and uses the k8 client we can use some fake client https://medium.com/the-phi/mocking-the-kubernetes-client-in-go-for-unit-testing-ddae65c4302

https://pkg.go.dev/k8s.io/client-go/kubernetes/fake#Clientset.AppsV1

Example:

**
type testClientConfig struct {
	namespace          string
	namespaceSpecified bool
	err                error
}

func (c *testClientConfig) Namespace() (string, bool, error) {
	return c.namespace, c.namespaceSpecified, c.err
}

func (c *testClientConfig) RawConfig() (api.Config, error) {
	panic("implement me")
}

func (c *testClientConfig) ClientConfig() (*rest.Config, error) {
	panic("implement me")
}

func (c *testClientConfig) ConfigAccess() clientcmd.ConfigAccess {
	panic("implement me")
}

func Test_GetBrokerPodNames(t *testing.T) {
	// given
	k8Client := K8Client{Clientset: fake.NewSimpleClientset(), ClientConfig: &testClientConfig{namespace: "default"}}

	k8Client.Clientset.CoreV1().Pods(k8Client.GetCurrentNamespace()).Create(context.TODO(), &v1.Pod{
		Spec: v1.PodSpec{

		},
	}, v12.CreateOptions{})

	// when
	names, err := k8Client.GetBrokerPodNames()

	// then
	require.NoError(t, err)
	require.NotNil(t, names)
}

https://www.youtube.com/watch?v=reDCJYbxtRg&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D

ChrisKujawa · 2022-12-21T09:32:03Z

Happy to announce that the EPIC is done and we have release v1.0.0 https://github.com/zeebe-io/zeebe-chaos/releases/tag/zbchaos-v1.0.0

ChrisKujawa added the kind/epic label May 6, 2022

ChrisKujawa mentioned this issue May 6, 2022

Define how to test go fault injector #140

Closed

This was referenced Nov 11, 2022

I can disconnect the gateway via Zbchaos #225

Closed

I can deploy models via Zbchaos #226

Closed

Remove kotlin chaos worker #227

Closed

Remove chaos scripts #228

Closed

ChrisKujawa self-assigned this Nov 11, 2022

ChrisKujawa mentioned this issue Nov 17, 2022

[EPIC] Go Chaos Worker #121

Closed

15 tasks

ChrisKujawa mentioned this issue Dec 8, 2022

Adjust all chaos experiments to use the zbchaos tool #237

Closed

9 tasks

ChrisKujawa closed this as completed Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] A reusable fault-injector and resolver #139

[EPIC] A reusable fault-injector and resolver #139

ChrisKujawa commented May 6, 2022 •

edited

Loading

ChrisKujawa commented Aug 8, 2022 •

edited

Loading

ChrisKujawa commented Aug 10, 2022 •

edited

Loading

ChrisKujawa commented Dec 21, 2022

[EPIC] A reusable fault-injector and resolver #139

[EPIC] A reusable fault-injector and resolver #139

Comments

ChrisKujawa commented May 6, 2022 • edited Loading

Motivation

Solution

Todo's left:

Inventory

Done

ChrisKujawa commented Aug 8, 2022 • edited Loading

ChrisKujawa commented Aug 10, 2022 • edited Loading

ChrisKujawa commented Dec 21, 2022

ChrisKujawa commented May 6, 2022 •

edited

Loading

ChrisKujawa commented Aug 8, 2022 •

edited

Loading

ChrisKujawa commented Aug 10, 2022 •

edited

Loading