-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
karpenter scale up new node with EFS #5691
Comments
You should use a |
Hey, i tested using startupTaint, forgot to mention. with startupTaints: Is not working, but i think it's more ticket to EFS. if there is no way to spawn node before limit is reached. |
Was the pod being scheduled? Was the EFS driver removing the startupTaint? If there's any more details about this race you could give, hopefully we can definitively say if it's on Karpenter or the efs-csi-driver. |
"Was the EFS driver removing the startupTaint?" i assume yes, cuz it is set in nodeclaim, and right now node don't have any taints. kubectl get pods
describe
kubectl get nodes -o json | jq '.items[].spec.taints' kubectl get nodeclaim hxps-66ae7155-6992-40d7-a887-ae5a746c5c25-dev-mseries-btnkm -o yaml
|
If a node is being provisioned and the pod is being scheduled I'm going to wager the problem is on the EFS side. Without pod logs it's of course impossible to tell for sure, may be good to take up with the EFS folks. |
@jmdeal and it seems that efs/karpenter removed this taint to early.
|
Karpenter is not responsible for removing the taint, at that point it seems like it would be an aws-efs-cis-driver issue. I see you already opened an issue there, I'm going to close out this one for the time being. Feel free to reopen if something points it back to Karpenter. |
Description
Observed Behavior:
We have cluster with 1 NODE and full limit pods 58/58(but i believe same problem will be with more nodes)
We are installing additional pods with EFS, but pod is failing cuz of some race problem EFS on new nodes, related ticket- kubernetes-sigs/aws-efs-csi-driver#1069
This bug is already "workarounded" by EFS using taints for nodes link to EFS docs
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/README.md#configure-node-startup-taint
So i added to karpenter nodepool this taint.
And if using this taint.
karpenter will not scale up new node, cuz no pods is in state pending
Pod is waiting with failedscheduling
Expected Behavior:
Some way to trigger creation of extra nodes, before max limit of number pods pods.
Or some other idea.
Reproduction Steps (Please include YAML):
Versions:
kubectl version
):The text was updated successfully, but these errors were encountered: