-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chaos: Terminating Gateway Stops Workflow Processing #336
Comments
Hey @shahamit sorry for the late reply. Regarding:
What means it stops, does it recover afterwards? After some time eventually? |
The affect will never be zero, because some request might fail or timeout, but yes after retry it should work and take the next gateway I agree. |
Yes after a few seconds, the workflows did start getting processed. In between though some workflow do fail (indicated by the backpressure % increasing). Given that the gateway replicas are behind a k8s service, shouldn't it automatically go to the next gateway instance instead of failing the workflows? |
Do you have any metrics to show? What type of load we are speaking of? 🤔
Yes if a new request comes in I would expect something like that.
Be aware that the process instances are not failing, they are just not continued right? |
Sorry for the late reply @Zelldon
We are running the benchmarking tool against a zeebe cluster of 7 brokers and 2 gateways. We could see the throughput as 170 PI/s.
This is hard to find out since the benchmarking tool starts around 170 process instances per second. If you can think of a way to find this out, please let me know. Thanks |
Chaos Experiment
When running the terminate chaos experiment against a zeebe cluster that was under load, we observed that the cluster stops processing any workflows there after.
Config - 6 brokers, 2 gateways, 6 partitions, 2 replication factor.
Note we don't have an ingress controller configured in front of the zeebe-gateway. Since our client (benchmarking tool in this case) runs within the same zeebe cluster it should be fine given that k8s service (zeebe-gateway) does the load balancing between them (which isn't happening but that's a separate issue).
We were hoping that since the client (benchmarking tool) connects to the k8s zeebe-gateway service, terminating one of the gateway instances shouldn't have any impact on the client. I didn't follow why do we see errors on the client. Please share more insights.
Thanks.
Benchmarking tool logs
Terminate command output
The text was updated successfully, but these errors were encountered: