Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX/show-gpus] Improve sorting. #3492

Merged
merged 5 commits into from
May 10, 2024
Merged

[UX/show-gpus] Improve sorting. #3492

merged 5 commits into from
May 10, 2024

Conversation

concretevitamin
Copy link
Member

@concretevitamin concretevitamin commented Apr 27, 2024

Previous, prices across clouds + within cloud are not sorted properly.

This PR:

        # For each gpu name (count not included):
        #   - Group by cloud
        #   - Sort within each group by prices
        #   - Sort groups by each cloud's (min price, min spot price)

Comparison: two text files in https://gist.github.com/concretevitamin/1fda666bfa129aedd3a3ced485e8f05c

Speed seems the same:

#!/bin/bash
set -ex

export BASH_XTRACEFD=1

sky show-gpus L4
sky show-gpus L4:1
sky show-gpus L4:1 --all-regions
sky show-gpus L4 --cloud aws

sky show-gpus A100
sky show-gpus A100:8
sky show-gpus A100:8 --all-regions
sky show-gpus A100:8 --cloud aws
master: median 15.454 total

15.454 total
14.471 total
16.712 total

PR: median 15.391 total
15.391 total
15.497 total
14.612 total

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

# - Group by cloud
# - Sort within each group by prices
# - Sort groups by each cloud's (min price, min spot price)
new_result = {}
Copy link
Collaborator

@Michaelvll Michaelvll Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It is great that we can have additional sorting ability for our sky show-gpus. A long existing issue I was wondering was that if we can group by the (GPU, gpu_count, cloud) and sort by price, instead of group by cloud.
Most of the time I looked at the table, I tried to directly compare those machines with the same config across clouds, and our current way mixes different count makes it quite hard to compare. Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the reason we did it this way was to optimize for single-cloud users. For them, it'd be messy to group by (GPU, gpu_count, cloud) as they need to skim across many irrelevant clouds.

Copy link
Collaborator

@Michaelvll Michaelvll Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a single cloud user, they can always specify --cloud in show-gpus to get the results for that cloud only, so it might not be a blocker for the change.

I think we didn't do the (GPU, gpu_count, cloud) because the information retrieved from the catalogs is for each cloud, and we did not have the regrouping and resorting code path in CLI before. Since we have the "frontend" sorting enabled, we could probably switch to the tuple.
IMO, in most cases, comparing different configs of GPUs may not be as useful as comparing the same config but with different pricing.

That said, I am fine with having the current version merged first and have another issue for the proposed change for the different key for grouping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to optimize for single-cloud only users. For them it's cumbersome to type an additional --cloud argument because it should be inferred as unique. Given these, I believe the current treatment is good.

Copy link
Collaborator

@Michaelvll Michaelvll Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I can now think of a benefit for grouping by cloud:
It will be easier to see all the available number of GPUs offered by a specific cloud.

While for the proposed key for grouping, it will be easier to find which cloud offers the specific number of GPU requested.

I am still half-half for this though, since most of the time a user knows how many GPUs they want for their workloads, e.g., 7B model requires a single L4, while 70B requires at least 4 40GB GPUs.
It should be fine to merge this PR first, as it is definitely better than the current unsorted version in master branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be easier to see all the available number of GPUs offered by a specific cloud.

Yep. Also, if one's single-cloud only, easier to show that cloud's offerings in a consecutive group.

While for the proposed key for grouping, it will be easier to find which cloud offers the specific number of GPU requested.

For that, if users pass the count in the argument, both ways of grouping are the same (for most cases, except some where the same cloud has different X:1 offerings).

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the sort for the GPU pricing @concretevitamin! LGTM. Please run the format.sh before merging.

Also, it would be nice to add some unit tests for the behavior of show-gpus.

Another issue, it seems we have A100 and A100-SXM separated for RunPod while combined for lambda labs. We probably need to separate them for all the clouds.

The above issues can be fixed in another PR.

@Michaelvll
Copy link
Collaborator

Just a kind bump for this @concretevitamin. Are we going to merge it?

@concretevitamin
Copy link
Member Author

Oops, slipped through the cracks. Just opened two issues for the above items. Merging this now.

@concretevitamin concretevitamin merged commit 05aafb8 into master May 10, 2024
20 checks passed
@concretevitamin concretevitamin deleted the fix-show-gpus branch May 10, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants