Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Auto mapping for cluster name #2403

Merged
merged 71 commits into from
Aug 26, 2023
Merged

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Aug 14, 2023

Closes #1853 and Closes #963

The backward compatibility is guaranteed by the restore of ray yaml in the backend_utils.write_cluster_config

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • sky launch -c i-am-a-very-long-cluster-name-with-more-than-35-characters --cloud gcp --cpus 2
    • sky status -r
    • sky exec i-am-a-very-long-cluster-name-with-more-than-35-characters echo hi
    • sky autostop -i 1 i-am-a-very-long-cluster-name-with-more-than-35-characters
    • sky status -r
  • All smoke tests: pytest tests/test_smoke.py
  • All smoke tests: pytest tests/test_smoke.py --aws
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll changed the title Auto mapping for cluster name [Core] Auto mapping for cluster name Aug 14, 2023
@Michaelvll
Copy link
Collaborator Author

Logging/UX: Tried sky spot launch --cpus 2+ -n spot-maskgit-minerl-maskgit-interp --cloud gcp 'for i in $(seq 10000); do echo $i; sleep 1; done' which is a long name a user told us before. In controller logs

Hey @concretevitamin, thanks for pointing this out! We intentionally truncate the cluster name earlier to 30 chars to avoid the cluster name being too long after appending the job id, which will cause another truncation in the underlying sky.launch, hiding the job_id in the cluster name. I think it should be fine, as a normal user is not expected to use sky status in the spot controller, and the cluster name in the controller log is not critical as there will be a single cluster name for each spot job. If we check the sky spot queue, the spot job should be the same as the user specified. Wdyt?

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Aug 23, 2023

Thanks for the detailed review @concretevitamin @cblmemo! I fixed most of the comments.

I just realized that we should not tag clusters with original cluster names, as for most of the clouds, tags have the same length limit as cluster names, so I removed the tags.

TODO:

  • Add a readme for the design of cluster name mapping
  • Test again.
    • manual test
    • pytest tests/test_smoke.py
    • pytest tests/test_smoke.py --aws

@cblmemo
Copy link
Collaborator

cblmemo commented Aug 23, 2023

Thanks for the detailed review @concretevitamin @cblmemo! I fixed most of the comments.

I just realized that we should not tag clusters with original cluster names, as for most of the clouds, tags have the same length limit as cluster names, so I removed the tags.

TODO:

  • Add a readme for design of cluster name mapping
  • Test again.

Just a quick thought - then maybe we could have a column for cluster_name_on_cloud for sky status -a..?

sky/templates/gcp-ray.yml.j2 Outdated Show resolved Hide resolved
sky/templates/azure-ray.yml.j2 Outdated Show resolved Hide resolved
sky/templates/aws-ray.yml.j2 Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
sky/skylet/constants.py Outdated Show resolved Hide resolved
sky/design_docs/cluster_name.md Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @Michaelvll - thanks for this major improvement!

sky/skylet/constants.py Outdated Show resolved Hide resolved
sky/design_docs/cluster_name.md Outdated Show resolved Hide resolved
@Michaelvll Michaelvll merged commit 1555648 into master Aug 26, 2023
@Michaelvll Michaelvll deleted the auto-mapping-for-cluster-name branch August 26, 2023 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants