-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure: update fetch_azure to support two H100 families. #2844
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,17 +6,22 @@ | |
# NOTE: Must include accelerators supported for local clusters. | ||
# | ||
# 1. What if a name is in this list, but not in any catalog? | ||
# | ||
# The name will be canonicalized, but the accelerator will not be supported. | ||
# Optimizer will print an error message. | ||
# | ||
# 2. What if a name is not in this list, but in a catalog? | ||
# | ||
# The list is simply an optimization to short-circuit the search in the catalog. | ||
# If the name is not found in the list, it will be searched in the catalog | ||
# with its case being ignored. If a match is found, the name will be | ||
# canonicalized to that in the catalog. Note that this lookup can be an | ||
# expensive operation, as it requires reading the catalog or making external | ||
# API calls (such as for Kubernetes). Thus it is desirable to keep this list | ||
# up-to-date with commonly used accelerators. | ||
|
||
# 3. (For SkyPilot dev) What to do if I want to add a new accelerator? | ||
# | ||
# Append its case-sensitive canonical name to this list. The name must match | ||
# `AcceleratorName` in the service catalog, or what we define in | ||
# `onprem_utils.get_local_cluster_accelerators`. | ||
|
@@ -42,6 +47,7 @@ | |
'Radeon MI25', | ||
'P4', | ||
'L4', | ||
'H100', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems GKE is using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it seems we've been using |
||
] | ||
|
||
|
||
|
@@ -72,11 +78,11 @@ def canonicalize_accelerator_name(accelerator: str) -> str: | |
if len(names) == 1: | ||
return names[0] | ||
|
||
# Do not print an error meessage here. Optimizer will handle it. | ||
# Do not print an error message here. Optimizer will handle it. | ||
if len(names) == 0: | ||
return accelerator | ||
|
||
# Currenlty unreachable. | ||
# Currently unreachable. | ||
# This can happen if catalogs have the same accelerator with | ||
# different names (e.g., A10g and A10G). | ||
assert len(names) > 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, I don't see this family in
az vm list-skus --all --resource-type virtualMachines -l southcentralus | grep v5
. How do we get this family?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this on my end got
My
az account subscription list
shows subscriptions*7
and*a
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was wondering where we get the family
standardNDSH100v5Family
. Seems it is not included in the output you sent either?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was obtained by inspecting an intermediate dataframe. Also seen in:
i.e., with the location removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry. I forgot that location argument. Just removed the location and see the family. Thanks for the explanation.