-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide ability on when to actually rotate expired nodes #903
Comments
Hey @mikesir87 thanks for writing this up. I'm definitely interested in supporting some sort of "maintenance window" for scaling down. We've talked in the team about the idea of a NodeDisruptionBudget that would work similarly to a PDB, but we could add additional fields like maintenance windows. For your use case, there may be an easy alternative. Karpenter respects Pod Disruption Budgets as well as GraceTerminationSeconds. When node termination is triggered, the node is cordoned, and the pods evicted. PDBs prevent eviction, and Karpenter will never force terminate a pod, instead delegating to GraceTerminationSeconds. Do you think these existing mechanisms are suitable for your use case? I'd be happy to discuss either of these ideas further, either here or in the https://github.com/aws/karpenter/blob/main/WORKING_GROUP.md |
@ellistarn - Good to know about those options. But, I don't think they'll work for this use case, as we may not always know what value the grace period should have. Imagine the scenario in which a node expires at noon, but we want to rotate it at 3am. But, a scale-up event occurs that adds another node that will cause a node to expire at 4pm. Recognizing that there are a few sliding windows time-wise, anything second-based will start to drift over time and doesn't ensure the nodes will rotate in the correct window (without something else monitoring the values and adjusting them). And thanks for pointing me to the working group. I'll have to swing by and say hello to the team! 😄 |
Yeah this makes sense to me. What I'm hearing is that while it would be nice to specify everything at the pod level, it's important for the ops team to be able to protect dev teams with broadly applied policy. Thoughts on the NodeDisruptionBudget approach? |
Yeah... if we (as a platform/ops team) are going to automatically expire and rotate nodes, we want to make sure it doesn't occur during a time that affects our customers. I'll have to see what a NodeDisruptionBudget might look like, but the hypothetical sounds reasonable so far. |
Closing in favor of #1738 |
solved with kubernetes-sigs/karpenter#849 |
Tell us about your request
I'd love to have the ability on a
Provisioner
to configure when expiration/rotation of nodes should actually occur. The idea would be to have the the ability on when to mark a node as expired versus when to actually rotate the nodes (default to once expired).Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Many of the applications I support are not able to run in HA modes and require sticky sessions. But, most of those systems currently restart once a day, early in the morning, when they have very little traffic. While the app team loves the idea of automatically rotating nodes to ensure their nodes are running the latest AMIs, they worry the expiration might happen during a bad time and impact users.
Are you currently working around this issue?
Currently, we're planning to not use expiration on those nodes but then use our own job that runs during the maintenance window and deletes nodes, effectively managing our own expiration time.
Additional context
Not that I can think of currently.
Community Note
The text was updated successfully, but these errors were encountered: