-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lambda] Fix termination for lambda #3410
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this @Michaelvll - I tested with sky launch -c myclus --cloud lambda --gpus A10
and sky down myclus
and it works.
@@ -172,6 +172,10 @@ def _get_internal_ip(node: Dict[str, Any]): | |||
self.metadata.refresh([node['id'] for node in vms]) | |||
self._guess_and_add_missing_tags(vms) | |||
nodes = [_extract_metadata(vm) for vm in filter(_match_tags, vms)] | |||
nodes = [ | |||
node for node in nodes | |||
if node['status'] not in ['terminating', 'terminated'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious - is terminating
a new state added by lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is newly added, but I saw the terminating
state in the response from the lambda API during the state query. It should be added now. : )
{'_content': b'{"data": [{"id": "9acba6b6ea0a4f01a832d0d6936c2d4f", "name": "test-lambda-084e-head", "ip": "129.158.54.150", "region": {"name": "us-east-1", "description": "Virginia, USA"}, "instance_type": {"name": "gpu_1x_a10", "price_cents_per_hour": 75, "description": "1x A10 (24 GB PCIe)", "specs": {"vcpus": 30, "memory_gib": 200, "storage_gib": 1400, "gpus": 1}},
"status": "terminating",
"ssh_key_names": ["sky-key-084e3d6c"], "file_system_names": [], "hostname": "129-158-54-150.cloud.lambdalabs.com", "jupyter_token": "4a2aa85687fc4ff3bfe4e870c99b38c0", "jupyter_url": "https://jupyter-743363eeca7c4968b525ea825951ea2b.lambdaspaces.com/?token=4a2aa85687fc4ff3bfe4e870c99b38c0"}]}', '_content_consumed': True, '_next': None, 'status_code': 200, 'headers': {'Date': 'Tue, 02 Apr 2024 21:11:47 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'vary': 'Accept-Encoding, Cookie', 'x-frame-options': 'SAMEORIGIN', 'strict-transport-security': 'max-age=300; includeSubDomains', 'x-content-type-options': 'nosniff', 'referrer-policy': 'same-origin', 'Content-Encoding': 'gzip', 'CF-Cache-Status': 'DYNAMIC', 'Server': 'cloudflare', 'CF-RAY': '86e3d037aa330622-IAD'}, 'raw': <urllib3.response.HTTPResponse object at 0x14dae6beb2e0>, 'url': 'https://cloud.lambdalabs.com/api/v1/instances', 'encoding': 'utf-8', 'history': [], 'reason': 'OK', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(microseconds=609443), 'request': <PreparedRequest [GET]>, 'connection': <requests.adapters.HTTPAdapter object at 0x14dae6bea2f0>}
This includes the
terminating
status for Lambda Cloud, andnon_terminated_nodes
should filter out the terminating and terminated nodes.sky launch -c test-lambda --cloud lambda --gpus A10; sky down test-lambda
Error out with:
Tested (run the relevant ones):
bash format.sh
sky launch -c test-lambda --cloud lambda --gpus A10; sky down test-lambda
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh