-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop/down clusters with new provisioner #2121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for separating this PR out from the new provisioner PR @suquark! I really like how tidy these codes look. Left several questions and thought, mainly about how we decouple things. We can jump on a call when more discussion is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @suquark! The code mostly looks good to me. While I met the following error during the test with pytest tests/test_smoke.py --aws
, i.e. the self.run_on_head
does not work well when the cluster does not exist.
File "/home/gcpuser/skypilot/sky/backends/cloud_vm_ray_backend.py", line 3295, in teardown_no_lock
if not self.run_on_head(handle, 'ray stop --force'):
File "/home/gcpuser/skypilot/sky/utils/common_utils.py", line 256, in _record
return f(*args, **kwargs)
File "/home/gcpuser/skypilot/sky/backends/cloud_vm_ray_backend.py", line 3632, in run_on_head
head_ip = backend_utils.get_head_ip(handle, _FETCH_IP_MAX_ATTEMPTS)
File "/home/gcpuser/skypilot/sky/utils/common_utils.py", line 256, in _record
return f(*args, **kwargs)
File "/home/gcpuser/skypilot/sky/backends/backend_utils.py", line 1575, in get_head_ip
head_ip = _query_head_ip_with_retries(handle.cluster_yaml, max_attempts)
File "/home/gcpuser/skypilot/sky/backends/backend_utils.py", line 1375, in _query_head_ip_with_retries
raise RuntimeError('Failed to get head ip') from e
RuntimeError: Failed to get head ip
Thanks for your review! could you also tell me the failed tests? I can test if by myself |
900dd3a
to
6469dd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @suquark! Just passed the test smoothly pytest tests/test_smoke.py
. LGTM.
Woohoo!! Very excited to see this @suquark! |
This PR is part of the new provisioner PR, which handles stopping/terminating of clusters. This PR only includes the implementation for AWS, but it can serve as a starting point for other clouds.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh