You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After some more investigation, the issue we're experiencing seems to be an unintended consequence of #10. I'm not deeply familiar with Lease objects and how the csi-lib-utils's leader election package interacts with them, but it seems like something like this is happening:
csi-resizer: acquires resizer lease
volumemodifier: acquires resizer lease
csi-resizer: attempts to renew resizer lease
csi-resizer: fails to renew resizer lease, as the object has been modified by volumemodifier
csi-resizer: shuts down
I tested this theory by reverting volumemodifier from 0.1.2 to 0.1.1 on one of the affected clusters.
Here are the csi-resizer logs when running 0.1.2 of volumemodifier. The pod crashes at the end of the logs.
I0927 18:54:51.868483 1 main.go:93] Version : v1.8.0
I0927 18:54:51.868523 1 feature_gate.go:249] feature gates: &{map[]}
I0927 18:54:51.869880 1 common.go:111] Probing CSI driver for readiness
I0927 18:54:51.872105 1 main.go:141] CSI driver name: "ebs.csi.aws.com"
I0927 18:54:51.872871 1 common.go:111] Probing CSI driver for readiness
I0927 18:54:51.874756 1 leaderelection.go:245] attempting to acquire leader lease storage/external-resizer-ebs-csi-aws-com...
I0927 18:54:51.902162 1 leaderelection.go:255] successfully acquired lease storage/external-resizer-ebs-csi-aws-com
I0927 18:54:51.902274 1 leader_election.go:178] became leader, starting
I0927 18:54:51.902306 1 controller.go:255] Starting external resizer ebs.csi.aws.com
E0927 19:04:34.559684 1 leaderelection.go:364] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "external-resizer-ebs-csi-aws-com": the object has been modified; please apply your changes to the latest version and try again
E0927 19:04:39.606718 1 leaderelection.go:364] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "external-resizer-ebs-csi-aws-com": the object has been modified; please apply your changes to the latest version and try again
I0927 19:04:44.456735 1 leaderelection.go:280] failed to renew lease storage/external-resizer-ebs-csi-aws-com: timed out waiting for the condition
F0927 19:04:44.456776 1 leader_election.go:182] stopped leading
I0927 19:04:44.456863 1 controller.go:274] Shutting down external resizer ebs.csi.aws.com
Here are the csi-resizer logs when running 0.1.1 of volumemodifier. The pod stays healthy this time:
/kind bug
See description of issue here.
After some more investigation, the issue we're experiencing seems to be an unintended consequence of #10. I'm not deeply familiar with Lease objects and how the csi-lib-utils's leader election package interacts with them, but it seems like something like this is happening:
I tested this theory by reverting volumemodifier from
0.1.2
to0.1.1
on one of the affected clusters.Here are the csi-resizer logs when running
0.1.2
of volumemodifier. The pod crashes at the end of the logs.Here are the csi-resizer logs when running
0.1.1
of volumemodifier. The pod stays healthy this time:and I can see that the resizer lease object is actively being renewed (renewed 1 minute ago, no controller crashes):
Let me know how I can help or if I need to provide any more information.
The text was updated successfully, but these errors were encountered: