Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut v0.2.4 release #415

Merged
merged 1 commit into from
May 19, 2021
Merged

Cut v0.2.4 release #415

merged 1 commit into from
May 19, 2021

Conversation

ellistarn
Copy link
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ellistarn ellistarn merged commit e02166b into aws:main May 19, 2021
@ellistarn ellistarn deleted the release branch May 19, 2021 20:48
gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023
There was a race with deprovisioning with steps 2 & 3 where we could:
1) See the cluster as unconsolidated
2) Run the deprovisioners unsuccessfully
3) Something happens that makes consolidation possible (e.g. pod
   deletion)
4) We mark the cluster as unconsolidatable
5) On the next round, do nothing since cluster is unconsolidatable

The end result is that we mark the cluster as unconsolidatable which
prevents us from trying again to save CPU time. Thankfully this was
built with a 5 minute timeout to handle recovering from ICE situations
automatically which limited the severity of the problem.

This change modifies the code to use a monotonically increasing
timestamp.  If the cluster state changes (or nothing changes for five
minutes), it increases.

The race above is now prevented:

1) See the cluster state as X
2) Run the deprovisioners unsuccessfully
3) Something happens that makes consolidation possible (e.g. pod
   deletion), this increases cluster state to X+1
4) We mark the cluster as unconsolidatable as of X
5) On the next round, see the cluster state as X+1 and retry since
   we last tried at state X
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants