-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Lambda Labs #1136
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ewzeng
added a commit
to ewzeng/skypilot
that referenced
this pull request
Dec 9, 2022
ewzeng
added a commit
to ewzeng/skypilot
that referenced
this pull request
Dec 22, 2022
ewzeng
added a commit
that referenced
this pull request
Jan 30, 2023
* Apply gmittal's lambda lab PR (#1136) on top of commit ad37a47 * Basic working Lambda Labs support * Add error handling for Lambda Labs API and small lambda-ray.yml bugfix * Add automatic key generation, improve sky check, and resolve import bug * Improve Lambda Labs launch code and error handling * Remove bootstrap_config, change metadata file design, and resolve provisioning bug * Make autodown work on Lambda Labs * Add basic tests and improve lambda-ray.yml.j2 bugfix * Add sky cancel test and do not allow Lambda nodes to stop * Polish provider code and change local metadata path to avoid clutter * Update and move catalog out of repo * Clean up code * Cleanup and add CLI logs test * Disallow --num-nodes > 1 and rename some variables * Do not let optimizer consider Lambda Labs when launching spot * Fix issues arising from merge * Address Michaelvll comments Nits, improve error handling for autostop and --num-nodes > 1, regions_with_offering bugfix * Address infwinston comments Nits, lambda_keys format, improve error handling for autostop and --use-spot * Update Lambda Labs help string * Move Lambda Lab tests into smoke tests and change local tag file location * Improve remote node detection * Change tag file scheme * Add comments and change region_zone lookup * Use same tag file path for local and remote * Remove is_remote file * Clean up imports in Lambda Labs node_provider * Make optimizer skip clouds that do not implement requested_features * Rename Lambda Labs client functions, nits * Improve requested_features implementation, nits * Add type annotations, nits * Improve pytest serialization logic * Improve requested_features, introduce CloudImplementationFeatures enums * Update lambda_utils.Metadata, address nits * Fix conftest.py bug introduced in previous commit * Update test comment * Rename Lambda Labs -> Lambda Cloud * Fix tag file reuse bug * Testing nit * Fix auth bug and address nits * Address final nits * Fix typing issues from merge * Provide basic support for cpus in resource specification * Improve 'cpu' resource specification for Lambda Cloud
sumanthgenz
pushed a commit
to sumanthgenz/skypilot
that referenced
this pull request
Feb 22, 2023
* Apply gmittal's lambda lab PR (skypilot-org#1136) on top of commit ad37a47 * Basic working Lambda Labs support * Add error handling for Lambda Labs API and small lambda-ray.yml bugfix * Add automatic key generation, improve sky check, and resolve import bug * Improve Lambda Labs launch code and error handling * Remove bootstrap_config, change metadata file design, and resolve provisioning bug * Make autodown work on Lambda Labs * Add basic tests and improve lambda-ray.yml.j2 bugfix * Add sky cancel test and do not allow Lambda nodes to stop * Polish provider code and change local metadata path to avoid clutter * Update and move catalog out of repo * Clean up code * Cleanup and add CLI logs test * Disallow --num-nodes > 1 and rename some variables * Do not let optimizer consider Lambda Labs when launching spot * Fix issues arising from merge * Address Michaelvll comments Nits, improve error handling for autostop and --num-nodes > 1, regions_with_offering bugfix * Address infwinston comments Nits, lambda_keys format, improve error handling for autostop and --use-spot * Update Lambda Labs help string * Move Lambda Lab tests into smoke tests and change local tag file location * Improve remote node detection * Change tag file scheme * Add comments and change region_zone lookup * Use same tag file path for local and remote * Remove is_remote file * Clean up imports in Lambda Labs node_provider * Make optimizer skip clouds that do not implement requested_features * Rename Lambda Labs client functions, nits * Improve requested_features implementation, nits * Add type annotations, nits * Improve pytest serialization logic * Improve requested_features, introduce CloudImplementationFeatures enums * Update lambda_utils.Metadata, address nits * Fix conftest.py bug introduced in previous commit * Update test comment * Rename Lambda Labs -> Lambda Cloud * Fix tag file reuse bug * Testing nit * Fix auth bug and address nits * Address final nits * Fix typing issues from merge * Provide basic support for cpus in resource specification * Improve 'cpu' resource specification for Lambda Cloud
Morphed into #1557. |
sumanthgenz
pushed a commit
to sumanthgenz/skypilot
that referenced
this pull request
Mar 15, 2023
* Apply gmittal's lambda lab PR (skypilot-org#1136) on top of commit ad37a47 * Basic working Lambda Labs support * Add error handling for Lambda Labs API and small lambda-ray.yml bugfix * Add automatic key generation, improve sky check, and resolve import bug * Improve Lambda Labs launch code and error handling * Remove bootstrap_config, change metadata file design, and resolve provisioning bug * Make autodown work on Lambda Labs * Add basic tests and improve lambda-ray.yml.j2 bugfix * Add sky cancel test and do not allow Lambda nodes to stop * Polish provider code and change local metadata path to avoid clutter * Update and move catalog out of repo * Clean up code * Cleanup and add CLI logs test * Disallow --num-nodes > 1 and rename some variables * Do not let optimizer consider Lambda Labs when launching spot * Fix issues arising from merge * Address Michaelvll comments Nits, improve error handling for autostop and --num-nodes > 1, regions_with_offering bugfix * Address infwinston comments Nits, lambda_keys format, improve error handling for autostop and --use-spot * Update Lambda Labs help string * Move Lambda Lab tests into smoke tests and change local tag file location * Improve remote node detection * Change tag file scheme * Add comments and change region_zone lookup * Use same tag file path for local and remote * Remove is_remote file * Clean up imports in Lambda Labs node_provider * Make optimizer skip clouds that do not implement requested_features * Rename Lambda Labs client functions, nits * Improve requested_features implementation, nits * Add type annotations, nits * Improve pytest serialization logic * Improve requested_features, introduce CloudImplementationFeatures enums * Update lambda_utils.Metadata, address nits * Fix conftest.py bug introduced in previous commit * Update test comment * Rename Lambda Labs -> Lambda Cloud * Fix tag file reuse bug * Testing nit * Fix auth bug and address nits * Address final nits * Fix typing issues from merge * Provide basic support for cpus in resource specification * Improve 'cpu' resource specification for Lambda Cloud
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a (very rough) first pass at implementing a node provider for Lambda Labs which has some of the cheapest cloud GPUs available.
TODO
sky gpunode
works e2e with failover)sky check
setupAll suggestions and feedback welcome!