-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow running as root to inject chaos #525
Conversation
d0fa86f
to
b2a5178
Compare
I'm still getting errors in testbench. Can't reproduce locally though.
Output:
|
One issue was that we did not disable reconciliation before enabling root access. Still, even after fixing this, I did not get any output from running the apt commands to install stress and procps. Executing manually in a zeebe pod that was set up for root access and then running
I think at this point there are three unresolved issues:
|
One idea I have to overcome this is to attach ephemeral debug containers that have the necessary tools installed. |
Was also thinking a out this but thought might not work with permissions? Another one I was thinking whether we could leverage ebpf |
I think I got the permissions working:
|
@oleschoenburg was it working? Can we get it merged? |
@Zelldon I have something ready locally that seems to work more or less but now the scaling test doesn't work anymore ?! I need to test this manually a bit more. |
ce187fb
to
d145b85
Compare
This overwrites the security context on both the container and the deployment/statefulset. Only works because reconciliation is already disabled at that point.
a195ce6
to
859486b
Compare
Due to camunda/camunda#17347 we can't rely on a sensible value
859486b
to
755f387
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oleschoenburg if this for real works this would be great! Could you check my comments?
cmd := []string{"ip", "route", "replace", "unreachable", podIp} | ||
cmdWithSetup := []string{"sh", "-c", "apt update && apt install -y iproute2 && " + strings.Join(cmd, " ")} | ||
var containerName string | ||
if strings.Contains(podName, "gateway") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to work with SaaS/SM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have also functions to get the right pod name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to attach to the correct container, both on SaaS and self-managed. We already have the right podName
here. Do you mean we have helper functions to get the right container name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod name you use is actually only correct in SaaS in SM it is different but not sure whether you use the pod name here anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code here uses the pod name to figure out the correct container name. If the pod name includes gateway, we assume that the container name is zeebe-gateway
, otherwise we expect it to be zeebe
. I think this should work in both SaaS and SM, right?
@@ -302,6 +303,35 @@ func (c K8Client) createPortForwardUrl(names []string) *url.URL { | |||
return portForwardCreateURL | |||
} | |||
|
|||
func (c K8Client) ExecuteCommandViaDebugContainer(podName string, containerName string, debugImage string, cmd []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verified whether it was for real disrupting the network? Or maybe it was just silently failing? 🤔 So for example did the brokers showed that they can't connect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet! That's one of the things I still have to do :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested it now with a SaaS cluster 🎉
$ zbchaos --namespace 8fe648b5-1137-4d6a-af0d-be8b467fd67e-zeebe disconnect brokers --broker2NodeId 2 --broker1NodeId 1 --verbose
Flags: {1 LEADER -1 10 msg false 1 LEADER 1 2 LEADER 2 1 1713797975967 false false true false false 30 false -1 benchmark 30 8fe648b5-1137-4d6a-af0d-be8b467fd67e-zeebe 1 1 benchmark-task 0 0 0 1 -1 true}
Connecting to 8fe648b5-1137-4d6a-af0d-be8b467fd67e-zeebe
Running experiment in SaaS environment.
Patched statefulset
Port forward to zeebe-gateway-5489b4cdcf-ptfl9
Successfully created port forwarding tunnel from 46535 (local) to 26500 (remote)
Found Broker zeebe-1 with node id 1.
Found Broker zeebe-2 with node id 2.
Debug container debug-lvccl4 is running command [sh -c apt update && apt install -y iproute2 && ip route replace unreachable 10.64.73.38]
Disconnect zeebe-1 from zeebe-2
Debug container debug-658dxp is running command [sh -c apt update && apt install -y iproute2 && ip route replace unreachable 10.64.53.29]
Disconnect zeebe-2 from zeebe-1
$ zbchaos --namespace 8fe648b5-1137-4d6a-af0d-be8b467fd67e-zeebe connect brokers --verbose
Flags: {1 LEADER -1 10 msg false 1 LEADER -1 2 LEADER -1 1 1713798188959 false false true false false 30 false -1 benchmark 30 8fe648b5-1137-4d6a-af0d-be8b467fd67e-zeebe 1 1 benchmark-task 0 0 0 1 -1 true}
Connecting to 8fe648b5-1137-4d6a-af0d-be8b467fd67e-zeebe
Running experiment in SaaS environment.
Debug container debug-cv444p is running command [sh -c apt update && apt install -y iproute2 && ip route del $(ip route | grep -m 1 unreachable)]
Connected zeebe-0 again, removed unreachable routes.
Debug container debug-zg45fm is running command [sh -c apt update && apt install -y iproute2 && ip route del $(ip route | grep -m 1 unreachable)]
Connected zeebe-1 again, removed unreachable routes.
Debug container debug-tm9pdj is running command [sh -c apt update && apt install -y iproute2 && ip route del $(ip route | grep -m 1 unreachable)]
Connected zeebe-2 again, removed unreachable routes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great stuff @oleschoenburg ❤️ thanks for your efforts!
Closes #520