-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug node autoprovisioning: did not match Pod's node affinity #677
Comments
From what I'm understanding of these docs, the auto-provisioner should be creating nodes with the same tolerations/node selectors as the pod that is trying to spin up https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#workload_separation |
I did the following:
Edited the yaml and removed the
And that seemed to spin up fine |
So, nodes can't be spun up because the node auto-provisioner is expecting to create nodes with the label In the pangeo hub config file, we've tried setting the following:
But none of those were successful at removing the node selector from the user pod. Instead, I removed the following lines from our |
@sgibson91 For what it's worth, in QHub we haven't tried node auto-provisioning. Instead we have node pools that are explicitly defined and pods get scheduled on them. Wish we could be of more help here |
Thanks @tylerpotts - that is 2i2c's default too. But when I raised the question about appropriate machine sizes for those pool in #666 we found out Pangeo are using auto-provisioning and we didn't really have any data to hand for optimising the machine sizes to expected load. |
@sgibson91 We have been recently struggling with the problem of optimizing workload to node size as well. For the most part we have been allocating a single node per user pod/dask pod which has helped somewhat when it comes to the larger scale clusters. As far as determining the allocatable resources available on the nodes, we have quite a bit of research detailed here that you may find useful: nebari-dev/nebari#792. Unfortunately there doesn't seem to be a linear formula, as kubernetes reserves variable amounts of millicpu and RAM depending on the size of the node |
I've added an update to the top comment, to reflect that we're manually provisioning the Pangeo hub for now! |
Description
In #670 we enabled node auto-provisioning. In practice what we are seeing when trying to create pods is the following message:
And this is preventing any new node from coming up.
Value / benefit
We need to spin nodes up!
Implementation details
No response
Tasks to complete
No response
Updates
medium
impact now.The text was updated successfully, but these errors were encountered: