Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AWS and GCP GPU images to have cuda 12.1 #49

Merged
merged 3 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,14 @@

## Automatic Catalog Fetching

Catalogs for AWS and GCP are automatically fetched & refreshed from the cloud provider, implemented as GitHub Actions. Other clouds can implement [catalog fetchers](https://github.com/skypilot-org/skypilot/tree/master/sky/clouds/service_catalog/data_fetchers) and a corresponding Action to add auto-refresh.
Catalogs for AWS and GCP are automatically fetched & refreshed from the cloud provider, implemented as GitHub Actions. Other clouds can implement [catalog fetchers](https://github.com/skypilot-org/skypilot/tree/master/sky/clouds/service_catalog/data_fetchers) and a corresponding [Action](./.github/workflows/) to add auto-refresh.

Catalogs are updated **every 7 hours**.





## Schema V5

The catalogs for each cloud in [v5](v5) include the following files:
Expand Down Expand Up @@ -44,3 +47,19 @@ To supply your own custom pricing or custom regions/zones, you can update vms.cs
| `ImageId` | string | The ID of the image that is used to launch the instance in the cloud. |
| `CreationDate` | string | The creation date of the image (mainly for tracking purpose). |


#### Update Images

For AWS, the images are automatically updated by the catalog fetcher. To update those images, please update the [fetch_aws.py](https://github.com/skypilot-org/skypilot/blob/master/sky/clouds/service_catalog/data_fetchers/fetch_aws.py) in SkyPilot repository.

For GCP, the images are updated manually. To check the latest images, please run the following command:
```bash
gcloud compute images list \
--project deeplearning-platform-release \
--no-standard-images --uri
```
A common case for updating the images is to support a latest CUDA driver. To do so, we can change the image link for tag `skypilot:gpu-debian-11` in [images.csv](./catalogs/v5/gcp/images.csv) according to the command above. For tracking the history, we can add another tag `skypilot:cu<version>-debian-11` that also points to the latest image link.
```csv
skypilot:cuda121-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-cu121-v20231105-debian-11-py310,20231105
skypilot:gpu-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-cu121-v20231105-debian-11-py310,20231105
```
44 changes: 22 additions & 22 deletions catalogs/v5/aws/images.csv
Original file line number Diff line number Diff line change
Expand Up @@ -21,28 +21,28 @@ skypilot:gpu-ubuntu-1804,us-east-1,ubuntu,18.04,ami-0d8c624d9d0f9af69,20221114
skypilot:gpu-ubuntu-1804,us-east-2,ubuntu,18.04,ami-05d40cbc5bf043e8b,20221114
skypilot:gpu-ubuntu-1804,us-west-1,ubuntu,18.04,ami-0257857e3e25bf4d8,20221114
skypilot:gpu-ubuntu-1804,us-west-2,ubuntu,18.04,ami-059fe31b45cbb1483,20221114
skypilot:gpu-ubuntu-2004,af-south-1,ubuntu,20.04,ami-03b7197252bfd0320,20230103
skypilot:gpu-ubuntu-2004,ap-east-1,ubuntu,20.04,ami-0193b21bfaf7e401c,20230103
skypilot:gpu-ubuntu-2004,ap-northeast-1,ubuntu,20.04,ami-0892c5a506ae63501,20230103
skypilot:gpu-ubuntu-2004,ap-northeast-2,ubuntu,20.04,ami-07dff00092ca8c4c9,20230103
skypilot:gpu-ubuntu-2004,ap-northeast-3,ubuntu,20.04,ami-01dfc9b0804ef2ec5,20230103
skypilot:gpu-ubuntu-2004,ap-south-1,ubuntu,20.04,ami-05130f503ad57c4dc,20230103
skypilot:gpu-ubuntu-2004,ap-southeast-1,ubuntu,20.04,ami-0030191060d56fffe,20230103
skypilot:gpu-ubuntu-2004,ap-southeast-2,ubuntu,20.04,ami-01adf038dd5ff7b7c,20230103
skypilot:gpu-ubuntu-2004,ap-southeast-3,ubuntu,20.04,ami-0c1dc319769ab397b,20230103
skypilot:gpu-ubuntu-2004,ca-central-1,ubuntu,20.04,ami-07840f2f4f417b52f,20230103
skypilot:gpu-ubuntu-2004,eu-central-1,ubuntu,20.04,ami-0a5cee07e1edc0cd2,20230103
skypilot:gpu-ubuntu-2004,eu-north-1,ubuntu,20.04,ami-0686a450407912fe6,20230103
skypilot:gpu-ubuntu-2004,eu-south-1,ubuntu,20.04,ami-0caca10c78d4dedd6,20230103
skypilot:gpu-ubuntu-2004,eu-west-1,ubuntu,20.04,ami-0cba489155488a29e,20230103
skypilot:gpu-ubuntu-2004,eu-west-2,ubuntu,20.04,ami-09c51a13d5ada2047,20230103
skypilot:gpu-ubuntu-2004,eu-west-3,ubuntu,20.04,ami-03b529a91c98c13fe,20230103
skypilot:gpu-ubuntu-2004,me-central-1,ubuntu,20.04,ami-0b111b0207a2e8998,20230103
skypilot:gpu-ubuntu-2004,me-south-1,ubuntu,20.04,ami-0c6c47801d2e9535e,20230103
skypilot:gpu-ubuntu-2004,us-east-1,ubuntu,20.04,ami-0b7e0d9b36f4e8f14,20230103
skypilot:gpu-ubuntu-2004,us-east-2,ubuntu,20.04,ami-0692f9ae92252aab9,20230103
skypilot:gpu-ubuntu-2004,us-west-1,ubuntu,20.04,ami-0b61d2979f583d63d,20230103
skypilot:gpu-ubuntu-2004,us-west-2,ubuntu,20.04,ami-06b81ce928c07a34f,20230103
skypilot:gpu-ubuntu-2004,af-south-1,ubuntu,20.04,ami-0abc73eadd231f5b8,20231103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add to https://github.com/skypilot-org/skypilot-catalog/blob/281a6b5febf770d7726dde1a6dc05d7b99180d2a/README.md on how to update various image.csv (e.g., cmd used to get all these new AMI IDs)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images.csv for AWS should be updated by the fetch_aws.py, i.e., this change is nice to have, and it will always be updated by the fetch_aws.py. Added a description in README.md

skypilot:gpu-ubuntu-2004,ap-east-1,ubuntu,20.04,ami-07b9c768e7fd9e1c4,20231103
skypilot:gpu-ubuntu-2004,ap-northeast-1,ubuntu,20.04,ami-05494db6fe40e0cfd,20231103
skypilot:gpu-ubuntu-2004,ap-northeast-2,ubuntu,20.04,ami-06c2362f3f6842b93,20231103
skypilot:gpu-ubuntu-2004,ap-northeast-3,ubuntu,20.04,ami-0d0f85163134c668e,20231103
skypilot:gpu-ubuntu-2004,ap-south-1,ubuntu,20.04,ami-071323fe2bf59945b,20231103
skypilot:gpu-ubuntu-2004,ap-southeast-1,ubuntu,20.04,ami-052b2045207f966fd,20231103
skypilot:gpu-ubuntu-2004,ap-southeast-2,ubuntu,20.04,ami-0ab4b182c87dec1f4,20231103
skypilot:gpu-ubuntu-2004,ap-southeast-3,ubuntu,20.04,ami-05e4b487950d33238,20231103
skypilot:gpu-ubuntu-2004,ca-central-1,ubuntu,20.04,ami-0806f81daae35b77a,20231103
skypilot:gpu-ubuntu-2004,eu-central-1,ubuntu,20.04,ami-04de0c1a97eeeb0cb,20231103
skypilot:gpu-ubuntu-2004,eu-north-1,ubuntu,20.04,ami-0fa526e5f242077b7,20231103
skypilot:gpu-ubuntu-2004,eu-south-1,ubuntu,20.04,ami-0bb968520723db74e,20231103
skypilot:gpu-ubuntu-2004,eu-west-1,ubuntu,20.04,ami-0eee12f2bb3531eab,20231103
skypilot:gpu-ubuntu-2004,eu-west-2,ubuntu,20.04,ami-00509eb77c05ed212,20231103
skypilot:gpu-ubuntu-2004,eu-west-3,ubuntu,20.04,ami-0412902f1d729aa71,20231103
skypilot:gpu-ubuntu-2004,me-central-1,ubuntu,20.04,ami-08684cb12bc6c56c8,20231103
skypilot:gpu-ubuntu-2004,me-south-1,ubuntu,20.04,ami-0ae99f0dea3745018,20231103
skypilot:gpu-ubuntu-2004,us-east-1,ubuntu,20.04,ami-0ac1f653c5b6af751,20231103
skypilot:gpu-ubuntu-2004,us-east-2,ubuntu,20.04,ami-0aa5328ffcf5d34ac,20231103
skypilot:gpu-ubuntu-2004,us-west-1,ubuntu,20.04,ami-0bfb2ef7f314185e4,20231103
skypilot:gpu-ubuntu-2004,us-west-2,ubuntu,20.04,ami-0c95e55075f3c7f51,20231103
skypilot:k80-ubuntu-1804,af-south-1,ubuntu,18.04,ami-0ad58a43b5ecfa67b,20211208
skypilot:k80-ubuntu-1804,ap-east-1,ubuntu,18.04,ami-0f2f173232b208f63,20211208
skypilot:k80-ubuntu-1804,ap-northeast-1,ubuntu,18.04,ami-0636c1b24fd00de58,20211208
Expand Down
5 changes: 3 additions & 2 deletions catalogs/v5/gcp/images.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ skypilot:cpu-debian-10,,debian,10,projects/deeplearning-platform-release/global/
skypilot:k80-debian-10,,debian,10,projects/deeplearning-platform-release/global/images/common-cu113-v20220701,20220701
skypilot:gpu-debian-10,,debian,10,projects/deeplearning-platform-release/global/images/common-cu113-v20221215,20221215
skypilot:cuda118-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-gpu-v20230615-debian-11-py310,20230615
skypilot:cpu-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-cpu-v20230615-debian-11-py310,20230615
skypilot:gpu-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-gpu-v20230615-debian-11-py310,20230615
skypilot:cuda121-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-cu121-v20231105-debian-11-py310,20231105
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: when do we add a new row, skypilot:cuda121-debian-11? Is it always when we're upgrading the default image's CUDA version? Worth adding to README too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the readme. PTAL. : )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

skypilot:cpu-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-cpu-v20231105-debian-11-py310,20231105
skypilot:gpu-debian-11,,debian,11,projects/deeplearning-platform-release/global/images/common-cu121-v20231105-debian-11-py310,20231105