-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failing to launch dask worker pods on AWS #870
Comments
perhaps i need to re-apply dask-gateway CRDs, but i thought these things are all wrapped up in the helm deploy step now...? #584 (comment) |
Any logs in the dask-gateway API server or controller?
What versions of dask-gateway are being used in the singleuser / worker images, and what version is on the cluster?
… On Nov 9, 2020, at 12:59 PM, Scott Henderson ***@***.***> wrote:
perhaps i need to re-apply dask-gateway CRDs, but i thought these things are all wrapped up in the helm deploy step now...? #584 (comment) <#584 (comment)>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#870 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOITPP7HZ5U4JE7CBVN3SPA3YLANCNFSM4TPXFJJA>.
|
I am getting this too. Thanks for opening an issue.
gives
But no workers ever become available. Logs just say
|
everything is running the same image (pangeo/pangeo-notebook:2020.10.27 --> ecr.us-west-2.amazonaws.com/pangeo:b364ec4) I'm not seeing anything obvious in any of the logs, but looks like there is a typo here (shouldn't have 'staging'). I'll see if that fixes things... (from https://github.com/pangeo-data/pangeo-cloud-federation/pull/846/files) pangeo-cloud-federation/deployments/icesat2/config/prod.yaml Lines 41 to 43 in b2c04c5
|
sure enough, it was that typo on the schedulerName. we're back up and running! |
Thank you!!! |
Good catch, sorry about that! |
The last update to prod on AWS is failing to launch dask workers. Not sure what is wrong, but this is the first time we've deployed to prod via github actions. Everything still is working fine on staging.
https://github.com/pangeo-data/pangeo-cloud-federation/runs/1375800740?check_suite_focus=true
on prod, dask-worker pods remain in 'Pending state' with nothing in the logs. No error messages in the jupyter notebook either. Only digging into other dask gateways related pods do I see some error messages. cc @TomAugspurger @rsignell-usgs
kubectl logs -n icesat2-prod traefik-icesat2-prod-dask-gateway-84bd7f7c7-97j5l
The text was updated successfully, but these errors were encountered: