Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS launch template deletion on cache eviction #1278
AWS launch template deletion on cache eviction #1278
Changes from all commits
a9dd97d
8991905
9b548de
b1b9e14
7c13273
d66c6ee
9f0602d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to lock since the cache eviction is already threadsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache eviction is threadsafe, however onCacheEvicted is not threadsafe. For example, if an LT is evicted from the cache, ensureLaunchTemplate can be executed that will receive a cache miss, and then query LTs and find the LT that was evicted but has not been deleted yet. After it finds that LT, onCacheEvicted can run (if the lock is removed) and delete the LT before it is used which will propagate as a launch failure in Fleet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock isn't saving us in this case, since the LT creation has already happened. The only thing saving us from this race is the expiration timeout -- I guess this is fine (necessary?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lock does save us in the scenario I mentioned. The LT creation I mentioned would occur in the
ensureLaunchTemplate
func which also takes a lock. Locking in both of these funcs ensures that the cache is always consistent with the state of EC2.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we'll want to retry this deletion if we fail? Is this called within a controller somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not retried (doesn't reconcile) since this is only executed on cache eviction, BUT it does rehydrate on startup, so if something did happen, a restart of Karpenter would clean them up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make some upgrade instructions to help users migrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we definitely should... not sure how we'd structure them. I was thinking release notes, but another way would be adding an upgrade section in the versioned docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want a single command to run which updates my IAM. Maybe cfn deploy works out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah cfn should work fine, but I suspect most users are integrating our template (cfn or terraform or cdk) into their own infrastructure-as-code so there won't be a one-size fits all solution.