Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multithreaded ssh setup #4158

Merged
merged 5 commits into from
Oct 24, 2024
Merged

Conversation

asaiacai
Copy link
Contributor

@asaiacai asaiacai commented Oct 23, 2024

Runs ssh initialization and setup asynchronously for faster startup. Closes #4156

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @asaiacai! This is a very important improvement. Some quick thoughts : )

sky/provision/provisioner.py Outdated Show resolved Hide resolved
@asaiacai
Copy link
Contributor Author

asaiacai commented Oct 24, 2024

these are some preliminary numbers but need to also test k8s and run smoke tests
before PR

(base) ubuntu@ip-172-31-16-12:~/skypilot$ time sky launch --cloud aws --num-nodes 20 --cpus 2+ -y -c scale --region us-east-1

real    1m13.987s
user    0m13.215s
sys     0m1.214s

(base) ubuntu@ip-172-31-16-12:~/skypilot$ time sky launch --cloud aws --num-nodes 40 --cpus 2+ -y -c scale --region us-east-1
real    1m39.759s
user    0m16.282s
sys     0m2.386s

# sky local kubernetes
(base) ubuntu@ip-172-31-16-12:~/skypilot$ time sky launch --cloud kubernetes --num-nodes 30 --cpus 2+ -y -c scale
real    1m58.966s
user    0m39.549s
sys     0m7.922s

after PR

(base) ubuntu@ip-172-31-16-12:~/skypilot$ time sky launch --cloud aws --num-nodes 20 --cpus 2+ -y -c scale --region us-east-1

real    0m58.341s
user    0m11.981s
sys     0m1.223s

(base) ubuntu@ip-172-31-16-12:~/skypilot$ time sky launch --cloud aws --num-nodes 40 --cpus 2+ -y -c scale --region us-east-1

real    1m27.969s
user    0m17.599s
sys     0m2.208s

# sky local kubernetes
(base) ubuntu@ip-172-31-16-12:~/skypilot$ time sky launch --cloud kubernetes --num-nodes 30 --cpus 2+ -y -c scale
real    1m28.728s
user    0m40.810s
sys     0m7.921s

@asaiacai
Copy link
Contributor Author

reran smoke, but getting pretty limited capacity in AWS which is messing with the smoke tests if I can get some help running smoke on this as well.

@asaiacai asaiacai marked this pull request as ready for review October 24, 2024 03:34
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making this work @asaiacai!

@Michaelvll Michaelvll added this pull request to the merge queue Oct 24, 2024
Merged via the queue into skypilot-org:master with commit d6d339d Oct 24, 2024
20 checks passed
@asaiacai asaiacai deleted the ssh_multithread branch October 24, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Provisioner] Parallelize wait for SSH in provisioner
2 participants