Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TPU VM] Cannot sky down a TPU VM when it does not appear on GCP #1514

Closed
Michaelvll opened this issue Dec 12, 2022 · 1 comment · Fixed by #1500
Closed

[TPU VM] Cannot sky down a TPU VM when it does not appear on GCP #1514

Michaelvll opened this issue Dec 12, 2022 · 1 comment · Fixed by #1500
Labels
bug Something isn't working

Comments

@Michaelvll
Copy link
Collaborator

After running the tests/run_smoke_tests.sh the TPU VM test fails, and the following entry remains in the status table.

test-tpu-vm-pod-3446f302-b9                57 mins ago     1x GCP(TPU-VM[Spot], {'tpu-v2-32': 1}, accelerator_args={'runtime_vers...  INIT    -         sky launch -y -c test-tpu...

When I tried to sky down test-tpu-vm-pod-3446f302-b9, it says

sky down test-tpu-vm-pod-3446f302-b9
Terminating 1 cluster: test-tpu-vm-pod-3446f302-b9. Proceed? [Y/n]:
Terminating cluster test-tpu-vm-pod-3446f302-b9...failed. Please check the logs and try again.
Terminating 1 cluster ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--

Even with -p, it cannot be removed from the cluster table. cc @infwinston

@Michaelvll Michaelvll added the bug Something isn't working label Dec 12, 2022
@infwinston
Copy link
Member

Ah thanks. I think the smoke test failed because it couldn't get any TPU v2-32 at the moment (could you confirm?). But I'm not sure why it was left in status table. Could you paste the entire log?

for sky down -p issue, this should be fixed in #1500 which adds safeguard such situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants