-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create kubernetes events in case of insufficient capacity errors #3955
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
We need to have improved visibility of ICE events when karpenter encounters them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Just one thought on making this apply generally.
return nil, cloudprovider.NewInsufficientCapacityError(fmt.Errorf("all requested instance types were unavailable during launch")) | ||
} | ||
instance, err := c.instanceProvider.Create(ctx, nodeTemplate, machine, instanceTypes) | ||
if err != nil { | ||
if cloudprovider.IsInsufficientCapacityError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for us to surface this over in karpenter-core? I'm wondering if this might be a good place to surface the event when launch fails generally for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, we were looking at this again today, a couple of things we noticed which we wanted to share with you if we are in the right direction.
What we would like to add
- the ability to see the instance family, instance type to the event when karpenter is not able to provision for the same.
As from what we could see right now, in karpeneter-core here in the launchMachine step here https://github.com/aws/karpenter-core/blob/v0.28.0-rc.2/pkg/controllers/machine/lifecycle/launch.go#L107, we would not have enough information for the same to be planted inside the k8s event, as we just have access to the machine name there, which we have added as of now.
What we did notice was here in https://github.com/aws/karpenter/blob/main/pkg/cloudprovider/cloudprovider.go#L88, we could possibly make use of the types present there, what do you folks feel about the same? cc @Fedosin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ability to see the instance family, instance type to the event when karpenter
What's your ultimate goal here? Is it to track down all the instance types and offerings that are ICEd at a given time? If that's the case, then I suspect something like this might be a better mechanism to surface that.
I'm not opposed to surfacing those details also in the K8s event, I'm just curious what your ultimate goal is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-innis we are buildin a kubernetes controller around these things, so prometheus metrics are not really helpful for us. We'd like to have a kubernetes event object that we can react to.
Technically I don't have preferences where to create it - in karpenter or in karpenter-core. Both ways work for us :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. I'm wondering if we should just create a generic event for each individual ICE error based on the instance type rather than creating the event directly on the Machine. It seems like what you are interested in here is getting the high-level eventing detail that there has been a generic ICE error on an instance-type/AZ combination and you aren't necessarily interested that a Machine launch failed generically due to ICE.
What are your thoughts on firing an event with no involved object that details the instance type and az of the instance type that received the ICE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like what you are interested in here is getting the high-level eventing detail that there has been a generic ICE error on an instance-type/AZ combination and you aren't necessarily interested that a Machine launch failed generically due to ICE.
Hello there, @jonathan-innis . Thanks for the summary, yes, we are trying to have some more visibility added for the ICE event on the AZ/instance type combination.
What we are thinking in the direction further with this event is that, with that visibility, we would want our controller to stop recommending the instance family which we are trying to decide as a scheduling constraint for the application spec. To prevent a scenario where the application would stay in pending state due to non-availability of the instance-types in that instance family.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-innis Hello there, we tried addressing it here #3999, please have a look.
@jonathan-innis Thank you for your suggestion! I moved this code to karpenter-core: kubernetes-sigs/karpenter#356 |
It looks like other PRs are going to implement this change, so closing this one. |
Fixes #
Description
We need to have improved visibility of ICE events when karpenter encounters them.
Does this change impact docs?
Release Note
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.