-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UX/show-gpus] Improve sorting. #3492
Conversation
# - Group by cloud | ||
# - Sort within each group by prices | ||
# - Sort groups by each cloud's (min price, min spot price) | ||
new_result = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! It is great that we can have additional sorting ability for our sky show-gpus
. A long existing issue I was wondering was that if we can group by the (GPU, gpu_count, cloud)
and sort by price, instead of group by cloud.
Most of the time I looked at the table, I tried to directly compare those machines with the same config across clouds, and our current way mixes different count makes it quite hard to compare. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the reason we did it this way was to optimize for single-cloud users. For them, it'd be messy to group by (GPU, gpu_count, cloud) as they need to skim across many irrelevant clouds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a single cloud user, they can always specify --cloud
in show-gpus
to get the results for that cloud only, so it might not be a blocker for the change.
I think we didn't do the (GPU, gpu_count, cloud)
because the information retrieved from the catalogs is for each cloud, and we did not have the regrouping and resorting code path in CLI
before. Since we have the "frontend" sorting enabled, we could probably switch to the tuple.
IMO, in most cases, comparing different configs of GPUs may not be as useful as comparing the same config but with different pricing.
That said, I am fine with having the current version merged first and have another issue for the proposed change for the different key for grouping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to optimize for single-cloud only users. For them it's cumbersome to type an additional --cloud argument because it should be inferred as unique. Given these, I believe the current treatment is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I can now think of a benefit for grouping by cloud:
It will be easier to see all the available number of GPUs offered by a specific cloud.
While for the proposed key for grouping, it will be easier to find which cloud offers the specific number of GPU requested.
I am still half-half for this though, since most of the time a user knows how many GPUs they want for their workloads, e.g., 7B model requires a single L4, while 70B requires at least 4 40GB GPUs.
It should be fine to merge this PR first, as it is definitely better than the current unsorted version in master branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be easier to see all the available number of GPUs offered by a specific cloud.
Yep. Also, if one's single-cloud only, easier to show that cloud's offerings in a consecutive group.
While for the proposed key for grouping, it will be easier to find which cloud offers the specific number of GPU requested.
For that, if users pass the count in the argument, both ways of grouping are the same (for most cases, except some where the same cloud has different X:1 offerings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the sort for the GPU pricing @concretevitamin! LGTM. Please run the format.sh
before merging.
Also, it would be nice to add some unit tests for the behavior of show-gpus.
Another issue, it seems we have A100 and A100-SXM separated for RunPod while combined for lambda labs. We probably need to separate them for all the clouds.
The above issues can be fixed in another PR.
Just a kind bump for this @concretevitamin. Are we going to merge it? |
Oops, slipped through the cracks. Just opened two issues for the above items. Merging this now. |
Previous, prices across clouds + within cloud are not sorted properly.
This PR:
Comparison: two text files in https://gist.github.com/concretevitamin/1fda666bfa129aedd3a3ced485e8f05c
Speed seems the same:
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh