Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure catalog (azure/vms.csv) to add H100. #50

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

concretevitamin
Copy link
Member

The new catalog is obtained by checking out skypilot-org/skypilot#2844 and running

python <sky repo>/sky/clouds/service_catalog/data_fetchers/fetch_azure.py

New catalog contains H100:

» grep H100 *csv                                                                                                      
Standard_NC40ads_H100_v5,H100,1,40.0,320,H100,,,eastus,V2
Standard_NC40ads_H100_v5,H100,1,40.0,320,H100,,,southcentralus,V2
Standard_NC80adis_H100_v5,H100,2,80.0,640,H100,,,eastus,V2
Standard_NC80adis_H100_v5,H100,2,80.0,640,H100,,,southcentralus,V2
Standard_ND48s_H100_v5,H100,6,48.0,950,H100,,,northcentralus,V2
Standard_ND96isr_H100_v5,H100,12,96.0,1900,H100,117.984,29.496,northcentralus,V2
Standard_ND96is_H100_v5,H100,12,96.0,1900,H100,106.186,26.5465,northcentralus,V2

Known problems

  • First 5 lines above have Price and SpotPrice set to NaN -- Azure's pricing URLs currently do not return any pricing records for them, likely because they are in-preview
  • This means for example, sky launch -t Standard_NC40ads_H100_v5 fails, because pricing is not available

Workaround

  • If users want to use these instance types, they need to manually set the prices to 0 for these rows, in their azure/vms.csv
    • Tested: manually setting to 0 and running the above launch works

Note

  • Last two lines have non-NaN prices, which means Standard_ND96isr_H100_v5 and Standard_ND96is_H100_v5 should be launchable by people with quotas
  • On my laptop, SkyPilot recognizes them and proceeds to launch (then running into quota issues)
» sky launch --gpus H100:12                                                                                             
I 12-05 22:13:05 optimizer.py:694] == Optimizer ==
I 12-05 22:13:05 optimizer.py:717] Estimated cost: $106.2 / hour
I 12-05 22:13:05 optimizer.py:717]
I 12-05 22:13:05 optimizer.py:841] Considered resources (1 node):
I 12-05 22:13:05 optimizer.py:910] ---------------------------------------------------------------------------------------------------------
I 12-05 22:13:05 optimizer.py:910]  CLOUD   INSTANCE                  vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE      COST ($)   CHOSEN
I 12-05 22:13:05 optimizer.py:910] ---------------------------------------------------------------------------------------------------------
I 12-05 22:13:05 optimizer.py:910]  Azure   Standard_ND96is_H100_v5   96      1900      H100:12        northcentralus   106.19        ✔
I 12-05 22:13:05 optimizer.py:910] ---------------------------------------------------------------------------------------------------------

@Michaelvll
Copy link
Collaborator

Michaelvll commented Dec 7, 2023

Would it be good to add the same price of ND or just $0 for the preview H100 NC family? Otherwise, a normal user may have to change the catalog manually, which is a bit demanding.

@concretevitamin
Copy link
Member Author

Would it be good to add the same price of ND or just $0 for the preview H100 NC family? Otherwise, a normal user may have to change the catalog manually, which is a bit demanding.

Good call, I think a fake price, like $0 or $9999, is better to signal that it's temporary. Going with $0 for now since most users won't have quotas anyway.

@Michaelvll
Copy link
Collaborator

Would it be good to add the same price of ND or just $0 for the preview H100 NC family? Otherwise, a normal user may have to change the catalog manually, which is a bit demanding.

Good call, I think a fake price, like $0 or $9999, is better to signal that it's temporary. Going with $0 for now since most users won't have quotas anyway.

$0 sounds good to me! I am wondering if it is possible to enable the GitHub Action that fetch for the Azure catalog automatically as well considering more people are going to use Azure.

@concretevitamin
Copy link
Member Author

Updated, PTAL.

RE: Action. Maybe still ok to keep commit history smaller for now?

@concretevitamin concretevitamin merged commit c2f2f01 into master Dec 8, 2023
@concretevitamin concretevitamin deleted the az-update-h100 branch December 8, 2023 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants