Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Skip SSH setup for faster provisioning #4225

Closed
romilbhardwaj opened this issue Oct 31, 2024 · 1 comment
Closed

[k8s] Skip SSH setup for faster provisioning #4225

romilbhardwaj opened this issue Oct 31, 2024 · 1 comment

Comments

@romilbhardwaj
Copy link
Collaborator

Even though #4158 significantly improves multi-node provisioning time on k8s by parallelizing SSH setup, large jobs (50 nodes+) can still take a long time (~10 min, depending on degree of parallelism/number of CPU cores) to get SSH up and running on all pods.

From user:

is it possible to make it (SSH setup) "on demand"? for example, sky ssh host_name that sets up ssh connection and then connects to it? (edited)

my 2 cents is that ssh connection is not usually necessary for these long running training jobs or at least is not necessary when we launch the job if it's mostly for user convenience. additionally, we could also ssh using tools like k9s. so it's desirable to cut off the set up time as much as possible by making this optional. this also reduces the chance for timeouts, etc.

@romilbhardwaj
Copy link
Collaborator Author

Closed with #4393.

@Michaelvll Michaelvll added the OSS label Dec 19, 2024 — with Linear
@Michaelvll Michaelvll removed the OSS label Dec 19, 2024
@Michaelvll Michaelvll added the OSS label Dec 19, 2024 — with Linear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants