-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure: update fetch_azure to support two H100 families. #2844
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @concretevitamin! LGTM.
'standardNDSFamily': 'P40', | ||
'StandardNVADSA10v5Family': 'A10', | ||
'StandardNCadsH100v5Family': 'H100', | ||
'standardNDSH100v5Family': 'H100', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, I don't see this family in az vm list-skus --all --resource-type virtualMachines -l southcentralus | grep v5
. How do we get this family?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this on my end got
...
"family": "StandardNCadsH100v5Family",
"name": "Standard_NC40ads_H100_v5",
"size": "NC40ads_H100_v5",
"family": "StandardNCadsH100v5Family",
"name": "Standard_NC80adis_H100_v5",
"size": "NC80adis_H100_v5",
...
My az account subscription list
shows subscriptions *7
and *a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was wondering where we get the family standardNDSH100v5Family
. Seems it is not included in the output you sent either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was obtained by inspecting an intermediate dataframe. Also seen in:
» az vm list-skus --all --resource-type virtualMachines | grep -i h100
i.e., with the location removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry. I forgot that location argument. Just removed the location and see the family. Thanks for the explanation.
@@ -42,6 +47,7 @@ | |||
'Radeon MI25', | |||
'P4', | |||
'L4', | |||
'H100', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems GKE is using H100-80GB
as the name. Not sure if we want to align with that.
Pro: keep the name the same for GKE and other cloud's native GPU name
Con: it complicates the name, and H100 only has 80GB and 188GB versions, we may only want to have H100
and H100-188GB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems we've been using H100
for AWS and Lambda already (added in catalog files, not in this registry). Should be fine to use it for now.
…#2844) * Azure: update fetch_azure to support two H100 families. * format
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh