Automatic Update of Nodes #1716

DWSR · 2022-04-23T01:11:25Z

Tell us about your request

As a cluster operator, when a new AMI version is released for my cluster, Karpenter should slowly and gracefully update all nodes to the latest version without waiting for the node TTL

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I'd like to ensure that OS updates are applied to our clusters as soon as possible without causing undue node churn by setting an aggressive TTL (aggressive node churn also has other drawbacks currently due to no max-in-flight). This avoids the toil of ensuring that these patches are applied.

Are you currently working around this issue?

Yes, by either manually cycling nodes or waiting until node TTL.

Additional context
There is currently no mechanism to subscribe to updates to EKS Optimized AMIs: aws/containers-roadmap#734

Attachments
N/A

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

spring1843 · 2022-04-27T17:48:00Z

This is a duplicate of #1018

DWSR · 2022-04-27T17:54:34Z

I disagree that this is a duplicate. My proposal is about Karpenter periodically checking for a new AMI when using amiFamily and automatically rolling nodes over when a new one is found. In this scenario, there is no change to the Provisioner spec.

The linked issue (and issues linked from it) talk about reconciling nodes when the spec of the Provisioner that spawned them is updated.

bwagner5 · 2022-04-27T18:26:31Z

I think this issue is a part of what is being discussed here: #1457

njtran · 2022-04-27T19:25:24Z

Hey @DWSR, you’re right that these issues are different, but they are all related.

#1457 asks for a feature to reconcile nodes that are out of spec of the provisioner.
#1018 proposes a solution to rate-limit node rolling taking the assumption that Karpenter does the behavior specified in #1457.

This issue seems to ask for a new case similar to #1457 to consume another condition for when to roll nodes, this one being AMI changes. #1457 asks for a signal native to Kubernetes, and this issue asks for a signal native to AWS. I think this issue, #1457, and #1018 should all be aggregated into one issue (or at least linked to one) that discusses more signals on when to roll nodes, and controls for it.

This would allow us to think about cases where Karpenter would attempt to automatically reconcile "out-of-spec" capacity and how Karpenter could control or surface controls for this.

DWSR · 2022-04-27T21:00:43Z

I think aggregating all of the issues together makes lots of sense. I just disagree that it's been covered by the existing issues.

@bwagner5 The difference between #1457 and this request is that #1457 specifically mentions nodes that are "out of spec". In the scenario I am describing, the nodes are still technically "within spec" because they are using the correct AMI family, etc.

njtran · 2022-04-28T18:12:32Z

@DWSR: Opened #1738

prashil-g · 2022-05-18T09:53:01Z

+1

ellistarn · 2022-07-06T18:45:58Z

Closing in favor of #1738

DWSR added the feature New feature or request label Apr 23, 2022

spring1843 closed this as completed Apr 27, 2022

spring1843 reopened this Apr 27, 2022

njtran mentioned this issue Apr 28, 2022

Mega Issue: Deprovisioning Controls #1738

Closed

18 tasks

ellistarn closed this as completed Jul 6, 2022

njtran mentioned this issue Nov 7, 2022

Support specifying AMI family release version in provisioner #1495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Update of Nodes #1716

Automatic Update of Nodes #1716

DWSR commented Apr 23, 2022

spring1843 commented Apr 27, 2022

DWSR commented Apr 27, 2022

bwagner5 commented Apr 27, 2022

njtran commented Apr 27, 2022

DWSR commented Apr 27, 2022

njtran commented Apr 28, 2022

prashil-g commented May 18, 2022

ellistarn commented Jul 6, 2022

Automatic Update of Nodes #1716

Automatic Update of Nodes #1716

Comments

DWSR commented Apr 23, 2022

Community Note

spring1843 commented Apr 27, 2022

DWSR commented Apr 27, 2022

bwagner5 commented Apr 27, 2022

njtran commented Apr 27, 2022

DWSR commented Apr 27, 2022

njtran commented Apr 28, 2022

prashil-g commented May 18, 2022

ellistarn commented Jul 6, 2022