Skip to content

Commit

Permalink
Migrate Deployment distribution experiment (#268)
Browse files Browse the repository at this point in the history
Based on #267 (blocked by)

related to #237

-------

**Fixed some smaller issues, like:**

 *  rm one-direction from mutex
 * make possible to run workers against self-managed clusters
 * return correct errors on connect
 

**Migrate the Deployment distribution experiment, as the first
experiment, to zbchaos.**

The experiment was executed and verified via the integration test
against a self-managed cluster.

I moved the experiment into the `chaos-experiments/camunda-cloud/test/`
folder and migrated it, with that approach I was able to execute the
experiment with `eze` and running against my self-managed `zell-chaos`
zeebe cluster.

Log output:
```

Deploy file bpmn/chaos/actionRunner.bpmn (size: 8788 bytes).
Deployed process model bpmn/chaos/actionRunner.bpmn successful with key 2251799813685249.
Deploy file bpmn/chaos/chaosExperiment.bpmn (size: 21403 bytes).
Deployed process model bpmn/chaos/chaosExperiment.bpmn successful with key 2251799813685251.
Deploy file bpmn/chaos/chaosToolkit.bpmn (size: 11031 bytes).
Deployed process model bpmn/chaos/chaosToolkit.bpmn successful with key 2251799813685253.
Create ChaosToolkit instance
Open workers: [zbchaos, readExperiments].
Handle read experiments job [key: 2251799813685265]
Read experiments successful, complete job with: {"experiments":[{"contributions":{"availability":"high","reliability":"high"},"description":"Zeebe deployment distribution should be fault-tolerant. Zeebe should be able to handle network outages and fail-overs and distribute the deployments after partitions are available again.","method":[{"name":"Create network partition between leaders","provider":{"arguments":["disconnect","brokers","--broker1PartitionId","1","--broker2PartitionId","3","--one-direction"],"path":"zbchaos","timeout":900,"type":"process"},"type":"action"},{"name":"Deploy different deployment versions.","provider":{"arguments":["deploy","process","--multipleVersions","10"],"path":"zbchaos","timeout":900,"type":"process"},"type":"action"},{"name":"Connect leaders again","provider":{"arguments":["connect","brokers"],"path":"zbchaos","timeout":900,"type":"process"},"type":"action"},{"name":"Create process instance of latest version on partition one","provider":{"arguments":["verify","instance-creation","--bpmnProcessId","multiVersion","--version","10","--partitionId","1"],"path":"zbchaos","timeout":900,"type":"process"},"tolerance":0,"type":"probe"},{"name":"Create process instance of latest version on partition two","provider":{"arguments":["verify","instance-creation","--bpmnProcessId","multiVersion","--version","10","--partitionId","2"],"path":"zbchaos","timeout":900,"type":"process"},"tolerance":0,"type":"probe"},{"name":"Create process instance of latest version on partition three","provider":{"arguments":["verify","instance-creation","--bpmnProcessId","multiVersion","--version","10","--partitionId","3"],"path":"zbchaos","timeout":900,"type":"process"},"tolerance":0,"type":"probe"}],"rollbacks":[],"steady-state-hypothesis":{"probes":[{"name":"All pods should be ready","provider":{"arguments":["verify","readiness"],"path":"zbchaos","timeout":900,"type":"process"},"tolerance":0,"type":"probe"}],"title":"Zeebe is alive"},"title":"Zeebe deployment distribution","version":"0.1.0"},{"contributions":{"availability":"high","reliability":"high"},"description":"This fake experiment is just to test the integration with Zeebe and zbchaos workers","method":[{"name":"Show again the version","pauses":{"after":5},"provider":{"arguments":["version"],"path":"zbchaos","timeout":900,"type":"process"},"tolerance":0,"type":"action"}],"rollbacks":[],"steady-state-hypothesis":{"probes":[{"name":"Show version","provider":{"arguments":["version"],"path":"zbchaos","timeout":900,"type":"process"},"tolerance":0,"type":"probe"}],"title":"Zeebe is alive"},"title":"This is a fake experiment","version":"0.1.0"}]}.
Handle zbchaos job [key: 2251799813685328]
Running command with args: [verify readiness] 
Connecting to zell-chaos
Running experiment in self-managed environment.
All Zeebe nodes are running.
Handle zbchaos job [key: 2251799813685376]
Running command with args: [disconnect brokers --broker1PartitionId 1 --broker2PartitionId 3 --one-direction] 
Connecting to zell-chaos
Running experiment in self-managed environment.
Did not find zeebe cluster to pause reconciliation, ignoring. 
Patched statefulset
Successfully created port forwarding tunnel
Found Broker zell-chaos-zeebe-1 as LEADER for partition 1.
Found Broker zell-chaos-zeebe-2 as LEADER for partition 3.
Execute ["apt" "-qq" "update"] on pod zell-chaos-zeebe-1
Execute ["apt" "-qq" "install" "-y" "iproute2"] on pod zell-chaos-zeebe-1
Execute ["ip" "route" "replace" "unreachable" "10.0.4.223"] on pod zell-chaos-zeebe-1
Disconnect zell-chaos-zeebe-1 from zell-chaos-zeebe-2
Handle zbchaos job [key: 2251799813685585]
Running command with args: [deploy process --multipleVersions 10] 
Connecting to zell-chaos
Running experiment in self-managed environment.
Successfully created port forwarding tunnel
Deploy 10 versions of different type of models.
Deployed [2/10] versions.
Deployed [4/10] versions.
Deployed [6/10] versions.
Deployed [8/10] versions.
Deployed [10/10] versions.
Deployed different process models of different types and versions to zeebe!
Handle zbchaos job [key: 2251799813685677]
Running command with args: [connect brokers] 
Connecting to zell-chaos
Running experiment in self-managed environment.
Execute ["sh" "-c" "command -v ip"] on pod zell-chaos-zeebe-0
Error on connection Broker: zell-chaos-zeebe-0. Error: Execution exited with exit code 127 (Command not found). It is likely that the broker was not disconnected or restarted in between.
Execute ["sh" "-c" "command -v ip"] on pod zell-chaos-zeebe-1
Execute ["sh" "-c" "ip route | grep -m 1 unreachable"] on pod zell-chaos-zeebe-1
Execute ["sh" "-c" "ip route del "] on pod zell-chaos-zeebe-1
Error on connection Broker: zell-chaos-zeebe-1. Error: command terminated with exit code 255
Execute ["sh" "-c" "command -v ip"] on pod zell-chaos-zeebe-2
Error on connection Broker: zell-chaos-zeebe-2. Error: Execution exited with exit code 127 (Command not found). It is likely that the broker was not disconnected or restarted in between.
Handle zbchaos job [key: 2251799813685732]
Running command with args: [verify instance-creation --bpmnProcessId multiVersion --version 10 --partitionId 1] 
Connecting to zell-chaos
Running experiment in self-managed environment.
Successfully created port forwarding tunnel
Create process instance with BPMN process ID multiVersion and version 10 [variables: '', awaitResult: false]
Created process instance with key 2251799815354503 on partition 1, required partition 1.
The steady-state was successfully verified!
Handle zbchaos job [key: 2251799813685779]
Running command with args: [verify instance-creation --bpmnProcessId multiVersion --version 10 --partitionId 2] 
Connecting to zell-chaos
Running experiment in self-managed environment.
Successfully created port forwarding tunnel
Create process instance with BPMN process ID multiVersion and version 10 [variables: '', awaitResult: false]
Created process instance with key 4503599628043172 on partition 2, required partition 2.
The steady-state was successfully verified!
Handle zbchaos job [key: 2251799813685825]
Running command with args: [verify instance-creation --bpmnProcessId multiVersion --version 10 --partitionId 3] 
Connecting to zell-chaos
Running experiment in self-managed environment.
Successfully created port forwarding tunnel
Create process instance with BPMN process ID multiVersion and version 10 [variables: '', awaitResult: false]
Created process instance with key 2251799815355181 on partition 1, required partition 3.
Created process instance with key 2251799815355311 on partition 1, required partition 3.
Created process instance with key 6755399441722396 on partition 3, required partition 3.
The steady-state was successfully verified!
Handle zbchaos job [key: 2251799813685877]
Running command with args: [verify readiness] 
Connecting to zell-chaos
Running experiment in self-managed environment.
All Zeebe nodes are running.
Handle zbchaos job [key: 2251799813685969]
Running command with args: [version] 
zbchaos development (commit: HEAD)
Handle zbchaos job [key: 2251799813686012]
Running command with args: [version] 
zbchaos development (commit: HEAD)
Handle zbchaos job [key: 2251799813686157]
Running command with args: [version] 
zbchaos development (commit: HEAD)
Instance 2251799813685255 [definition 2251799813685253 ] completed
--- PASS: Test_ShouldBeAbleToRunExperiments (36.97s)
PASS

Process finished with the exit code 0

```
  • Loading branch information
ChrisKujawa authored Dec 6, 2022
2 parents fc2d60a + 8d913d6 commit 2ab1ff4
Show file tree
Hide file tree
Showing 5 changed files with 34 additions and 39 deletions.
2 changes: 1 addition & 1 deletion go-chaos/cmd/disconnect.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ func init() {
disconnectBrokers.Flags().IntVar(&broker2NodeId, "broker2NodeId", -1, "Specify the nodeId of the second Broker")
// general
disconnectBrokers.Flags().BoolVar(&oneDirection, "one-direction", false, "Specify whether the network partition should be setup only in one direction (asymmetric)")
disconnectBrokers.MarkFlagsMutuallyExclusive("broker2PartitionId", "broker2NodeId", "one-direction")
disconnectBrokers.MarkFlagsMutuallyExclusive("broker2PartitionId", "broker2NodeId")

// disconnect gateway
disconnect.AddCommand(disconnectGateway)
Expand Down
1 change: 1 addition & 0 deletions go-chaos/integration/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ func Test_ShouldBeAbleToDeployChaosModels(t *testing.T) {
func Test_ShouldBeAbleToRunExperiments(t *testing.T) {
// given
internal.Verbosity = true
cmd.Verbose = true
ctx := context.Background()
container := CreateEZEContainer(t, ctx)
defer container.StopLogProducer()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,57 +15,42 @@
"tolerance": 0,
"provider": {
"type": "process",
"path": "verify-readiness.sh",
"path": "zbchaos",
"arguments": ["verify", "readiness"],
"timeout": 900
}
}
]
},
"method": [
{
"type": "action",
"name": "Enable net_admin capabilities",
"provider": {
"type": "process",
"path": "apply_net_admin.sh"
},
"pauses": {
"after": 180
}
},
{
"name": "All pods should be ready",
"type": "probe",
"tolerance": 0,
"provider": {
"type": "process",
"path": "verify-readiness.sh",
"timeout": 900
}
},
{
"type": "action",
"name": "Create network partition between leaders",
"provider": {
"type": "process",
"path": "disconnect-leaders-one-way.sh"
"path": "zbchaos",
"arguments": ["disconnect", "brokers", "--broker1PartitionId", "1", "--broker2PartitionId", "3", "--one-direction"],
"timeout": 900
}
},
{
"type": "action",
"name": "Deploy different deployment versions.",
"provider": {
"type": "process",
"path": "deploy-different-versions.sh",
"arguments": ["Follower", "3"]
"path": "zbchaos",
"arguments": ["deploy", "process", "--multipleVersions", "10"],
"timeout": 900
}
},
{
"type": "action",
"name": "Delete network partition",
"name": "Connect leaders again",
"provider": {
"type": "process",
"path": "connect-leaders.sh"
"path": "zbchaos",
"arguments": ["connect", "brokers"],
"timeout": 900
}
},
{
Expand All @@ -74,8 +59,8 @@
"tolerance": 0,
"provider": {
"type": "process",
"path": "start-instance-on-partition-with-version.sh",
"arguments": ["1", "10"],
"path": "zbchaos",
"arguments": ["verify", "instance-creation", "--bpmnProcessId", "multiVersion", "--version", "10", "--partitionId", "1"],
"timeout": 900
}
},
Expand All @@ -85,8 +70,8 @@
"tolerance": 0,
"provider": {
"type": "process",
"path": "start-instance-on-partition-with-version.sh",
"arguments": ["2", "10"],
"path": "zbchaos",
"arguments": ["verify", "instance-creation", "--bpmnProcessId", "multiVersion", "--version", "10", "--partitionId", "2"],
"timeout": 900
}
},
Expand All @@ -96,8 +81,8 @@
"tolerance": 0,
"provider": {
"type": "process",
"path": "start-instance-on-partition-with-version.sh",
"arguments": ["3", "10"],
"path": "zbchaos",
"arguments": ["verify", "instance-creation", "--bpmnProcessId", "multiVersion", "--version", "10", "--partitionId", "3"],
"timeout": 900
}
}
Expand Down
14 changes: 10 additions & 4 deletions go-chaos/internal/network.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,15 +115,21 @@ func MakeIpReachableForPod(k8Client K8Client, podName string) error {
// we use replace to not break the execution, since add will return an exit code > 0 if the route exist
err := k8Client.ExecuteCmdOnPod([]string{"sh", "-c", "command -v ip"}, podName)

if err != nil && strings.Contains(err.Error(), "exit code 127") {
return errors.New("Execution exited with exit code 127 (Command not found). It is likely that the broker was not disconnected or restarted in between.")
if err != nil {
if strings.Contains(err.Error(), "exit code 127") {
return errors.New("Execution exited with exit code 127 (Command not found). It is likely that the broker was not disconnected or restarted in between.")
}
return err
}

var buf bytes.Buffer
err = k8Client.ExecuteCmdOnPodWriteIntoOutput([]string{"sh", "-c", "ip route | grep -m 1 unreachable"}, podName, &buf)

if err != nil && strings.Contains(err.Error(), "exit code 1") {
return errors.New("Execution exited with exit code 1 (ip route not found). It is likely that the broker was not disconnected or restarted in between.")
if err != nil {
if strings.Contains(err.Error(), "exit code 1") {
return errors.New("Execution exited with exit code 1 (ip route not found). It is likely that the broker was not disconnected or restarted in between.")
}
return err
}

err = k8Client.ExecuteCmdOnPodWriteIntoOutput([]string{"sh", "-c", fmt.Sprintf("ip route del %s", strings.TrimSpace(buf.String()))}, podName, &buf)
Expand Down
5 changes: 4 additions & 1 deletion go-chaos/worker/chaos_worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,10 @@ func HandleZbChaosJob(client worker.JobClient, job entities.Job, commandRunner C
commandCtx, cancelCommand := context.WithTimeout(ctx, timeout)
defer cancelCommand()

clusterAccessArgs := append([]string{}, "--namespace", *jobVariables.ClusterId+"-zeebe", "--clientId", jobVariables.AuthenticationDetails.ClientId, "--clientSecret", jobVariables.AuthenticationDetails.ClientSecret, "--audience", jobVariables.AuthenticationDetails.Audience)
var clusterAccessArgs []string
if *jobVariables.ClusterId != "" {
clusterAccessArgs = append(clusterAccessArgs, "--namespace", *jobVariables.ClusterId+"-zeebe", "--clientId", jobVariables.AuthenticationDetails.ClientId, "--clientSecret", jobVariables.AuthenticationDetails.ClientSecret, "--audience", jobVariables.AuthenticationDetails.Audience)
} // else we run local against our k8 context
commandArgs := append(clusterAccessArgs, jobVariables.Provider.Arguments...)

err = commandRunner(commandArgs, commandCtx)
Expand Down

0 comments on commit 2ab1ff4

Please sign in to comment.