-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024
gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024
Conversation
The apply steps are something like below I think: # A GCP account with permissions to the GCP project columbia at
# https://console.cloud.google.com/iam-admin/iam?project=columbia
# is required, and we don't have access with our @2i2c.org accounts
gcloud auth login --update-adc
gh pr checkout 3024
cd terraform/gcp
rm -rf .terraform
terraform init -backend-config backends/pangeo-backend.hcl
terraform workspace list
terraform workspace select pangeo-hubs
terraform apply --var-file projects/pangeo-hubs.tfvars |
|
Thank you @GeorgianaElena for working this!!! Hmmm, I don't get why is the core node pool replaced. Only the dask worker node pools are meant to be. If you can get only the dask-worker node pools destroyed, this can be applied in my mind. |
If you checkout the master branch, does terraform apply cause a replacement of the core node pools as well? It could be that we have some state mismatch unrelated to this PRs change. Ah... that is the case! I see the core node pool type is |
The plan looks to be updating dask nodepools now, as expected:
|
Merging this PR will trigger the following deployment actions. Support and Staging deployments
Production deployments
|
@consideRatio, I still don't see any dask nodes running. Ready to do a terraform apply? |
Wieee yes!!! |
Feel free to merge whenever you are ready @consideRatio 🚀 |
Thank you @GeorgianaElena!!!!!! Massive help! |
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/5973999496 |
pangeo-hubs
is the last 2i2c cluster that has multiple dask worker node types, so with this terraform applied and merged we can fix #2687.If we get all clusters to use a single node type with 16 CPU and 128 GB of memory (r5.4xlarge / n2-highmem-16), it enables us to provide good defaults for users using dask-gateway when they decide on how powerful their workers are to be. This is planned in #2687.
I'm not able to get this all the way through myself though as I lack access to the infrastructure.
Action plan
Current activity
Grafana dashboard at https://grafana.gcp.pangeo.2i2c.cloud is down because prometheus is crashing, so I can't understand if there is a history of always having dask worker nodes active or similar. I can get a brief response before it crashes, but it indicates no data is available anyhow...