Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader election conflict with csi-resizer #13

Closed
walkbooth opened this issue Sep 27, 2023 · 0 comments · Fixed by #14
Closed

Leader election conflict with csi-resizer #13

walkbooth opened this issue Sep 27, 2023 · 0 comments · Fixed by #14

Comments

@walkbooth
Copy link

/kind bug

See description of issue here.

After some more investigation, the issue we're experiencing seems to be an unintended consequence of #10. I'm not deeply familiar with Lease objects and how the csi-lib-utils's leader election package interacts with them, but it seems like something like this is happening:

  1. csi-resizer: acquires resizer lease
  2. volumemodifier: acquires resizer lease
  3. csi-resizer: attempts to renew resizer lease
  4. csi-resizer: fails to renew resizer lease, as the object has been modified by volumemodifier
  5. csi-resizer: shuts down

I tested this theory by reverting volumemodifier from 0.1.2 to 0.1.1 on one of the affected clusters.

Here are the csi-resizer logs when running 0.1.2 of volumemodifier. The pod crashes at the end of the logs.

I0927 18:54:51.868483       1 main.go:93] Version : v1.8.0
I0927 18:54:51.868523       1 feature_gate.go:249] feature gates: &{map[]}
I0927 18:54:51.869880       1 common.go:111] Probing CSI driver for readiness
I0927 18:54:51.872105       1 main.go:141] CSI driver name: "ebs.csi.aws.com"
I0927 18:54:51.872871       1 common.go:111] Probing CSI driver for readiness
I0927 18:54:51.874756       1 leaderelection.go:245] attempting to acquire leader lease storage/external-resizer-ebs-csi-aws-com...
I0927 18:54:51.902162       1 leaderelection.go:255] successfully acquired lease storage/external-resizer-ebs-csi-aws-com
I0927 18:54:51.902274       1 leader_election.go:178] became leader, starting
I0927 18:54:51.902306       1 controller.go:255] Starting external resizer ebs.csi.aws.com
E0927 19:04:34.559684       1 leaderelection.go:364] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "external-resizer-ebs-csi-aws-com": the object has been modified; please apply your changes to the latest version and try again
E0927 19:04:39.606718       1 leaderelection.go:364] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "external-resizer-ebs-csi-aws-com": the object has been modified; please apply your changes to the latest version and try again
I0927 19:04:44.456735       1 leaderelection.go:280] failed to renew lease storage/external-resizer-ebs-csi-aws-com: timed out waiting for the condition
F0927 19:04:44.456776       1 leader_election.go:182] stopped leading
I0927 19:04:44.456863       1 controller.go:274] Shutting down external resizer ebs.csi.aws.com

Here are the csi-resizer logs when running 0.1.1 of volumemodifier. The pod stays healthy this time:

I0927 15:43:51.026325       1 main.go:93] Version : v1.8.0
I0927 15:43:51.026363       1 feature_gate.go:249] feature gates: &{map[]}
I0927 15:43:51.027634       1 common.go:111] Probing CSI driver for readiness
I0927 15:43:51.033194       1 main.go:141] CSI driver name: "ebs.csi.aws.com"
I0927 15:43:51.034014       1 common.go:111] Probing CSI driver for readiness
I0927 15:43:51.035848       1 leaderelection.go:245] attempting to acquire leader lease storage/external-resizer-ebs-csi-aws-com...
I0927 15:44:07.650383       1 leaderelection.go:255] successfully acquired lease storage/external-resizer-ebs-csi-aws-com
I0927 15:44:07.650497       1 leader_election.go:178] became leader, starting
I0927 15:44:07.650523       1 controller.go:255] Starting external resizer ebs.csi.aws.com

and I can see that the resizer lease object is actively being renewed (renewed 1 minute ago, no controller crashes):

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2023-09-08T20:11:12Z"
  name: external-resizer-ebs-csi-aws-com
  namespace: storage
  resourceVersion: "23932620"
  uid: c7785a54-f545-416c-b108-12b34769074a
spec:
  acquireTime: "2023-09-27T15:44:07.631053Z"
  holderIdentity: ebs-csi-controller-768b67bc76-cxxkv
  leaseDurationSeconds: 15
  leaseTransitions: 7
  renewTime: "2023-09-27T19:35:20.084818Z"

Let me know how I can help or if I need to provide any more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant