Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCP] Machine image does not work with --disk-size #2488

Closed
Michaelvll opened this issue Aug 30, 2023 · 2 comments
Closed

[GCP] Machine image does not work with --disk-size #2488

Michaelvll opened this issue Aug 30, 2023 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@Michaelvll
Copy link
Collaborator

Michaelvll commented Aug 30, 2023

A user encountered the following issue:

The machine image created from an instance with 256 GB disk, and used the following command will create an instance with 256 GB disk instead of 500GB.

sky launch -c test-machine-image --cloud gcp --image-id projects/<project-id>/global/machineImages/<image-id> --disk-size 500 --cpus 2
@Michaelvll Michaelvll added bug Something isn't working good first issue Good for newcomers labels Aug 30, 2023
@jackyk0708
Copy link

I've reviewed the bug reported and it's clear that when launching a GCP cluster with a machine image originating from a 150 GB disk, specifying the --disk-size 200 parameter results in an instance retaining the original 150 GB size. I checked the cluster_config_file when provisioning a cluster via the ray up command, and the associated YAML configuration appear to be correct. I also ensured that the disk size matched the expected value when retrieving machine image info.

Based on GCP's documentation on resizing persistent disks, when we create a custom Linux image, we have to manually increase the size of the boot and non-boot disks. However, if we're using a public image, compute engine automatically resizes the boot disks.

Given this, the following command using a common image works as expected:
sky launch -c test-image --cloud gcp --image-id projects/deeplearning-platform-release/global/images/common-cpu-v20230807-debian-11-py310 --disk-size 200 --dryrun True
However, for custom images, the boot disk isn't resized:
sky launch -c custom-image --cloud gcp --image-id projects/resonant-gizmo-205304/global/machineImages/skyerror --disk-size 200 --dryrun True

Would it be more appropriate to report this issue under Ray for further investigation? Meanwhile, as a potential workaround, we can manually increase the size of the boot disk after the instance is created. This can be done with the following command:
gcloud compute disks resize DISK_NAME --size=NEW_SIZE --zone=ZONE

@Michaelvll
Copy link
Collaborator Author

This issue has been fixed by #2718. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants