Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] Fail early from provisioning when rsync is uninstalled #2168

Conversation

landscapepainter
Copy link
Collaborator

@landscapepainter landscapepainter commented Jul 2, 2023

This PR closes #2113

When rsync is not installed in the user's machine, ray_up() is reattempted for _MAX_RAY_UP_RETRY number of times without failing. As this is time consuming, this PR fails early in the provisioning step when rsync is not installed and gently hints the user to install rsync.

Considered failing from need_ray_up() as well, but at this point, state.db is already updated with INIT status of the cluster attempted to provisioned which isn't the most optimal scenario. Also, it takes time for single failure before reaching need_ray_up().

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • manally tested with running sky launch after deleting rsync with and w/o --dryrun option

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix @landscapepainter! We may need to move the check into the upper level?

sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@landscapepainter landscapepainter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michaelvll Ready for another look. Moved the check to _provision and doing the check only when it's not a dryrun. Tested with and without --dryrun flag as well after uninstalling rsync.

@Michaelvll
Copy link
Collaborator

@Michaelvll Ready for another look. Moved the check to _provision and doing the check only when it's not a dryrun. Tested with and without --dryrun flag as well after uninstalling rsync.

Thanks for the update @landscapepainter! Why do we only check it when it's not a dryrun? I feel like we can have it for the dryrun case as well. Aslo, can we have a separate function for the checking and call that function in the _provision? Reason: in the future, we may want to have more backend-specific prerequisite tests.

Copy link
Collaborator Author

@landscapepainter landscapepainter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michaelvll I guess it makes sense if the user can check while dryrun as well. Ready for another look!

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @landscapepainter! LGTM. Left two comments.

sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
@landscapepainter landscapepainter merged commit fd4baa6 into skypilot-org:master Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[UX] Fail early when rsync is not installed locally
2 participants