Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Instances that are temporarily not available should be skipped during provisioning a job #1234

Closed
peterschmidt85 opened this issue May 17, 2024 · 2 comments · Fixed by #1286
Assignees
Labels
bug Something isn't working

Comments

@peterschmidt85
Copy link
Contributor

Steps to reproduce

  1. Add an instance, e.g. via dstack pool add-ssh, and wait until the instance is shown as idle
  2. Shutdown the instance manually (via the cloud console or by plugging of the power)
  3. Call dstack run with --reuse

Actual behaviour

  1. The run is assigned to the instance (that is not available at the moment of submission of the run but still shown as idle)
  2. The run gets stuck in provisioning

Expected behaviour

  1. If the instance is temporarily unavailable, mark it with temproarily_unavailable=true (and show it in the console/UI)
  2. Skip such instances during provisioning new jobs

dstack version

Any

Server logs

No response

Additional information

No response

@peterschmidt85 peterschmidt85 added the bug Something isn't working label May 17, 2024
@r4victor
Copy link
Collaborator

Especially relevant after #1200. Currently, dstack may resubmit the run to the cloud instance that is no longer available (e.g. was interrupted/deleted). The dstack will remove the instance only after it doesn't hear from the instance for 20m.

@r4victor
Copy link
Collaborator

@peterschmidt85, any preferences how display instances that are temporarily unavailable? We could use unavailable status but it can easily be mistaken with busy (both mean unavailable for provisioning). I suggest we show it as unreachable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants