Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup AWS GPU docs & add T4s for uwhackweeks #1787

Merged
merged 1 commit into from
Oct 18, 2022

Conversation

yuvipanda
Copy link
Member

Fixes #1784

@yuvipanda yuvipanda requested a review from a team October 18, 2022 01:46
@github-actions
Copy link

github-actions bot commented Oct 18, 2022

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
gcp awi-ciroh Yes Support helm chart has been modified No
gcp pangeo-hubs Yes Support helm chart has been modified No
aws uwhackweeks Yes Support helm chart has been modified Yes Following helm chart values files were modified: common.values.yaml
gcp leap Yes Support helm chart has been modified No
kubeconfig utoronto Yes Support helm chart has been modified No
gcp linked-earth Yes Support helm chart has been modified No
gcp 2i2c Yes Support helm chart has been modified No
gcp cloudbank Yes Support helm chart has been modified No
gcp 2i2c-uk Yes Support helm chart has been modified No
aws openscapes Yes Support helm chart has been modified No
gcp callysto Yes Support helm chart has been modified No
gcp m2lines Yes Support helm chart has been modified No
aws carbonplan Yes Support helm chart has been modified No
gcp meom-ige Yes Support helm chart has been modified No

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
aws uwhackweeks snowex Following helm chart values files were modified: common.values.yaml

@yuvipanda
Copy link
Member Author

I also opened eksctl-io/eksctl#5797 to upgrade the version of the nvidia device plugin on eksctl lol

@yuvipanda
Copy link
Member Author

I've deployed and tested this, works ok!

- The bug we reported upstream to eksctl has been fixed! So eksctl
  is now responsible for setting up the GPU driver, not us!
  eksctl-io/eksctl#5277. Yay for fixing
  things upstream! This woudl also mean that eksctl is responsible
  for keeping these versions up to date, and not us. We bump up the
  required eksctl version to account for this.
- Based on pangeo-data/pangeo-docker-images#390
  and many other discussions (linked to from there), NVidia T4s are
  now preferred over older K80s. We update the AWS GPU docs to
  recognize this.
- Add PyTorch & Tensorflow images as options to the GPU profile here,
  so end users can choose!

Fixes 2i2c-org#1784
@yuvipanda yuvipanda merged commit d0fbd48 into 2i2c-org:master Oct 18, 2022
@yuvipanda
Copy link
Member Author

Thanks @sgibson91 and @GeorgianaElena

@github-actions
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/3274138959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Provide NVIDIA T4 GPU support for snowex.uwhackweeks hub
3 participants