Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: update fetch_azure to support two H100 families. #2844

Merged
merged 2 commits into from
Dec 7, 2023

Conversation

concretevitamin
Copy link
Member

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @concretevitamin! LGTM.

'standardNDSFamily': 'P40',
'StandardNVADSA10v5Family': 'A10',
'StandardNCadsH100v5Family': 'H100',
'standardNDSH100v5Family': 'H100',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly, I don't see this family in az vm list-skus --all --resource-type virtualMachines -l southcentralus | grep v5. How do we get this family?

Copy link
Member Author

@concretevitamin concretevitamin Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this on my end got

...
    "family": "StandardNCadsH100v5Family",
    "name": "Standard_NC40ads_H100_v5",
    "size": "NC40ads_H100_v5",
    "family": "StandardNCadsH100v5Family",
    "name": "Standard_NC80adis_H100_v5",
    "size": "NC80adis_H100_v5",
...

My az account subscription list shows subscriptions *7 and *a.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was wondering where we get the family standardNDSH100v5Family. Seems it is not included in the output you sent either?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was obtained by inspecting an intermediate dataframe. Also seen in:

» az vm list-skus --all --resource-type virtualMachines  | grep -i h100

i.e., with the location removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry. I forgot that location argument. Just removed the location and see the family. Thanks for the explanation.

@@ -42,6 +47,7 @@
'Radeon MI25',
'P4',
'L4',
'H100',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems GKE is using H100-80GB as the name. Not sure if we want to align with that.
Pro: keep the name the same for GKE and other cloud's native GPU name
Con: it complicates the name, and H100 only has 80GB and 188GB versions, we may only want to have H100 and H100-188GB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems we've been using H100 for AWS and Lambda already (added in catalog files, not in this registry). Should be fine to use it for now.

@concretevitamin concretevitamin merged commit 66b8635 into master Dec 7, 2023
19 checks passed
@concretevitamin concretevitamin deleted the az-h100 branch December 7, 2023 00:25
remyleone pushed a commit to remyleone/skypilot that referenced this pull request Dec 26, 2023
…#2844)

* Azure: update fetch_azure to support two H100 families.

* format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants