-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts not resilient to gateway restarts #136
Comments
Might make sense to use the service to be more resilient. Or use the helper |
The pod |
It is already using |
Yeah because getGateway is not included in the loop. |
Why don't we execute |
Currently, it is independent of where and against what it is executed. Local, helm, cloud/saas etc. |
I think it is no longer an issue, if we experience an issue the zbchaos worker will restart and retry later. Gateways are chosen random #297 |
Here the script finds one gateway
https://github.com/zeebe-io/zeebe-chaos/blob/aee26dc8070b93e31a37d14798504149bc867498/chaos-workers/chaos-experiments/scripts/start-instance-on-partition-with-version.sh#L19
And then it tries to exec into the gateway
https://github.com/zeebe-io/zeebe-chaos/blob/aee26dc8070b93e31a37d14798504149bc867498/chaos-workers/chaos-experiments/scripts/start-instance-on-partition-with-version.sh#L31
But between execution of these two lines, the gateway pod was terminated and a new pod was started to replace it. But the script tried to access the terminated gateway and eventually timeouts, failing the experiment.
The text was updated successfully, but these errors were encountered: