Leader election conflict with csi-resizer #13

walkbooth · 2023-09-27T19:38:58Z

/kind bug

See description of issue here.

After some more investigation, the issue we're experiencing seems to be an unintended consequence of #10. I'm not deeply familiar with Lease objects and how the csi-lib-utils's leader election package interacts with them, but it seems like something like this is happening:

csi-resizer: acquires resizer lease
volumemodifier: acquires resizer lease
csi-resizer: attempts to renew resizer lease
csi-resizer: fails to renew resizer lease, as the object has been modified by volumemodifier
csi-resizer: shuts down

I tested this theory by reverting volumemodifier from 0.1.2 to 0.1.1 on one of the affected clusters.

Here are the csi-resizer logs when running 0.1.2 of volumemodifier. The pod crashes at the end of the logs.

I0927 18:54:51.868483       1 main.go:93] Version : v1.8.0
I0927 18:54:51.868523       1 feature_gate.go:249] feature gates: &{map[]}
I0927 18:54:51.869880       1 common.go:111] Probing CSI driver for readiness
I0927 18:54:51.872105       1 main.go:141] CSI driver name: "ebs.csi.aws.com"
I0927 18:54:51.872871       1 common.go:111] Probing CSI driver for readiness
I0927 18:54:51.874756       1 leaderelection.go:245] attempting to acquire leader lease storage/external-resizer-ebs-csi-aws-com...
I0927 18:54:51.902162       1 leaderelection.go:255] successfully acquired lease storage/external-resizer-ebs-csi-aws-com
I0927 18:54:51.902274       1 leader_election.go:178] became leader, starting
I0927 18:54:51.902306       1 controller.go:255] Starting external resizer ebs.csi.aws.com
E0927 19:04:34.559684       1 leaderelection.go:364] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "external-resizer-ebs-csi-aws-com": the object has been modified; please apply your changes to the latest version and try again
E0927 19:04:39.606718       1 leaderelection.go:364] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "external-resizer-ebs-csi-aws-com": the object has been modified; please apply your changes to the latest version and try again
I0927 19:04:44.456735       1 leaderelection.go:280] failed to renew lease storage/external-resizer-ebs-csi-aws-com: timed out waiting for the condition
F0927 19:04:44.456776       1 leader_election.go:182] stopped leading
I0927 19:04:44.456863       1 controller.go:274] Shutting down external resizer ebs.csi.aws.com

Here are the csi-resizer logs when running 0.1.1 of volumemodifier. The pod stays healthy this time:

I0927 15:43:51.026325       1 main.go:93] Version : v1.8.0
I0927 15:43:51.026363       1 feature_gate.go:249] feature gates: &{map[]}
I0927 15:43:51.027634       1 common.go:111] Probing CSI driver for readiness
I0927 15:43:51.033194       1 main.go:141] CSI driver name: "ebs.csi.aws.com"
I0927 15:43:51.034014       1 common.go:111] Probing CSI driver for readiness
I0927 15:43:51.035848       1 leaderelection.go:245] attempting to acquire leader lease storage/external-resizer-ebs-csi-aws-com...
I0927 15:44:07.650383       1 leaderelection.go:255] successfully acquired lease storage/external-resizer-ebs-csi-aws-com
I0927 15:44:07.650497       1 leader_election.go:178] became leader, starting
I0927 15:44:07.650523       1 controller.go:255] Starting external resizer ebs.csi.aws.com

and I can see that the resizer lease object is actively being renewed (renewed 1 minute ago, no controller crashes):

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2023-09-08T20:11:12Z"
  name: external-resizer-ebs-csi-aws-com
  namespace: storage
  resourceVersion: "23932620"
  uid: c7785a54-f545-416c-b108-12b34769074a
spec:
  acquireTime: "2023-09-27T15:44:07.631053Z"
  holderIdentity: ebs-csi-controller-768b67bc76-cxxkv
  leaseDurationSeconds: 15
  leaseTransitions: 7
  renewTime: "2023-09-27T19:35:20.084818Z"

Let me know how I can help or if I need to provide any more information.

The text was updated successfully, but these errors were encountered:

torredil mentioned this issue Oct 3, 2023

Leader election conflict with csi-resizer bug fix #14

Merged

ConnorJC3 closed this as completed in #14 Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leader election conflict with csi-resizer #13

Leader election conflict with csi-resizer #13

walkbooth commented Sep 27, 2023

Leader election conflict with csi-resizer #13

Leader election conflict with csi-resizer #13

Comments

walkbooth commented Sep 27, 2023