Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Lambda Labs #1557

Merged
merged 47 commits into from
Jan 30, 2023
Merged

Add support for Lambda Labs #1557

merged 47 commits into from
Jan 30, 2023

Conversation

ewzeng
Copy link
Collaborator

@ewzeng ewzeng commented Dec 26, 2022

This PR adds support for Lambda Labs GPU Cloud.

How to try out [updated 1/27/2023]:

Setup

  1. Go to https://cloud.lambdalabs.com/api-keys to generate an API key and add the line api_key = [YOUR_API_KEY] to ~/.lambda_cloud/lambda_keys.
  2. Run sky check.

Launch

Some things you can run:

  • sky gpunode --instance-type gpu_1x_a100_sxm4
  • sky launch --cloud lambda examples/minimal.yaml --down
  • sky launch --cloud lambda --gpus A100 examples/huggingface_glue_imdb_app.yaml

Features and Limitations

Some limitations are:

  • Only 1-node clusters are supported (multi-node support coming soon!)
  • Spot instances are not supported (Lambda Cloud does not provide this feature).
  • Stopping and autostopping are not supported (Lambda Cloud does not provide this feature).
  • --image-id is not supported (Lambda Cloud does not provide this feature)
  • Lambda Labs rate limits its API (~1 launch every 10 seconds), so launching multiple clusters at once may fail.

Everything else should work. If you find a feature that is not supported, please let me know!

Acknowledgements

A large part of this PR is based on @gmittal's earlier work (#1136).

All suggestions and feedback are welcome! @concretevitamin @Michaelvll @infwinston @romilbhardwaj @gmittal

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @ewzeng. Some observations while trying it out. To stress test, I didn't exactly follow the steps ;)

  1. Switched to this PR, immediately ran sky check, saw GCP and Lambda disabled (expected). Then,
» sky launch --cloud lambda

Enabling Compute Engine API (free of charge; this may take a minute)...
Failed. Detailed output:
ERROR: (gcloud) The project property must be set to a valid project ID, not the project name [None]
To set your project, run:

  $ gcloud config set project PROJECT_ID

or to unset it, run:

  $ gcloud config unset project

sky.exceptions.ResourcesUnavailableError: Task sky-cmd requires Lambda which is not enabled. To enable access, run sky check , or change the cloud requirement

The GCP output is unexpected, while the last line is. Is this reproducible on your end?

  1. RE the step

api_key=[YOUR_API_KEY] to ~/.lambda/lambda_keys.
Is it possible to make it so that users can simply place [YOUR_API_KEY] in the file?

I'd also propose changing it to ~/.lambda_labs/api_key (former = to be more precise; latter = to use their terminology).

  1. (For discussion) I feel ambivalent about the code name Lambda (which is less precise/can cause misunderstanding with AWS Lambda, but easier to type) , vs. the longer name Lambda Labs, in --cloud and in catalog folder name. May be worth polling the dev team once the PR settles. (Personally I think the shorter name is ok.)

  2. (Still not following the steps)

» sky launch                                                                                                            1 ↵
⠋ Updating Lambda catalog: lambda/vms.csv
E 12-27 10:34:07 common.py:120] Failed to fetch Lambda catalog lambda/vms.csv. Please check your internet connection.
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/skypilot-org/skypilot-catalog/master/catalogs/v5/lambda/vms.csv

Since I have not placed the catalog file manually, this exits launching even for other clouds. I think the desired behavior should be keep going with the OK clouds.

Is this behavior also on master (e.g., if we manually remove + change AWS catalog's URL)? If so, it's okay and perhaps add a TODO.

  1. (For discussion) Now I have the catalog file. Slightly surprised by other clouds' default VM type being CPU-based, while Lambda's is GPU-based:
» sky cpunode                                                                                                           1 ↵
I 12-27 10:37:09 optimizer.py:606] == Optimizer ==
I 12-27 10:37:09 optimizer.py:618] Target: minimizing cost
I 12-27 10:37:09 optimizer.py:629] Estimated cost: $0.4 / hour
I 12-27 10:37:09 optimizer.py:629]
I 12-27 10:37:09 optimizer.py:686] Considered resources (1 node):
I 12-27 10:37:09 optimizer.py:714] ------------------------------------------------------------------------
I 12-27 10:37:09 optimizer.py:714]  CLOUD    INSTANCE           vCPUs   ACCELERATORS   COST ($)   CHOSEN
I 12-27 10:37:09 optimizer.py:714] ------------------------------------------------------------------------
I 12-27 10:37:09 optimizer.py:714]  AWS      m6i.2xlarge        8       -              0.38          ✔
I 12-27 10:37:09 optimizer.py:714]  Azure    Standard_D8_v4     8       -              0.38
I 12-27 10:37:09 optimizer.py:714]  Lambda   gpu_1x_a100_sxm4   30      A100:1         1.10
I 12-27 10:37:09 optimizer.py:714] ------------------------------------------------------------------------

Similar surprise when I typed sky launch and saw the table.

I think we can discuss / poll whether this output is okay or too surprising.

  1. With sky launch --cloud lambda -i1 there's a long stack trace. Maybe use
        with ux_utils.print_exception_no_traceback():
            raise ...
  1. Tried V100:8

I 12-27 10:47:06 cloud_vm_ray_backend.py:1311] Launching on Lambda europe-central-1 ()
W 12-27 10:47:09 cloud_vm_ray_backend.py:762] Got error(s) in europe-central-1:
W 12-27 10:47:09 cloud_vm_ray_backend.py:764]   LambdaError: instance-operations/launch/insufficient-capacity: Not enough capacity to fulfill launch request.

Nits

  • Can we remove the () after the region name
  • LambdaError -> LambdaLabsError?
  1. sky launch --cloud lambda -i1 --down --num-nodes 2 seems to proceed without an error saying >1 node is currently not supported.
I 12-27 11:36:07 cloud_vm_ray_backend.py:1311] Launching on Lambda us-east-1 ()
I 12-27 11:38:43 log_utils.py:45] Head node is up.
I 12-27 11:39:50 cloud_vm_ray_backend.py:1421] Successfully provisioned or found existing head VM. Waiting for workers.
E 12-27 11:42:52 backend_utils.py:1015] Timed out: waited for more than 90 seconds for new workers to be provisioned, but no progress.
E 12-27 11:42:52 cloud_vm_ray_backend.py:1181] *** Failed provisioning the cluster. ***
E 12-27 11:42:52 cloud_vm_ray_backend.py:1183] *** Terminating the failed cluster. ***
I 12-27 11:43:18 cloud_vm_ray_backend.py:1311] Launching on Lambda us-west-2 ()
I 12-27 11:45:50 log_utils.py:45] Head node is up.
...

At some point I ctrl-c'd this and saw 2 instances in console, one in us-east-1 (Virginia) and one in India ( asia-south-1 ). The former should've been terminated in the log above?

@ewzeng
Copy link
Collaborator Author

ewzeng commented Dec 27, 2022

Thanks for the detailed comments @concretevitamin!

  1. I do not get the GCP output when GCP is enabled (but Lambda Labs is disabled). I think it is an issue in our GCP code.

  2. I've changed ~/.lambda to ~/.lambda_labs. I am more reluctant to rename lambda_keys because the generated ssh key name is stored there too.

  3. Agreed, we can poll the dev team about it.

  4. The behavior on master is the same; if you remove the AWS catalog and change the catalog link, SkyPilot will error.

  5. Lambda Labs does not provide CPUs. I think it makes more sense if Lambda Labs is not listed in sky cpunode.

  6. I've removed the long stack trace.

  7. I've changed LambdaError to LambdaLabsError. The () after the region name is because the zone string is empty. Azure has the same problem.

  8. Good catch; I've disallowed multi-node clusters.

@Michaelvll
Copy link
Collaborator

Thanks for the fantastic work @ewzeng! I tried out the PR following the instructions given above:
It works excellently to launch/terminate/autodown/sky status -r the cluster. The stop/autostop fails correctly for the cluster on lambda. I have several comments as below and will go through the code soon:

  1. It would be good to remove the non-sxm4 version of A100 from the catalog to avoid confusion of specifying the instance type. In the future, we may want to distinguish the A100 vs A100-nvlink for all the clouds (which can represent the difference between sxm and non-sxm version.
  2. Seems sky launch --gpus A100 --use-spot fails with the following error, though it is available for GCP.
  File "/home/ubuntu/sky-lambda/sky/optimizer.py", line 281, in _estimate_nodes_cost_or_time
    cost_per_node = resources.get_cost(estimated_runtime)
  File "/home/ubuntu/sky-lambda/sky/resources.py", line 456, in get_cost
    hourly_cost = self.cloud.instance_type_to_hourly_cost(
  File "/home/ubuntu/sky-lambda/sky/clouds/lambda_labs.py", line 58, in instance_type_to_hourly_cost
    return service_catalog.get_hourly_cost(instance_type,
  File "/home/ubuntu/sky-lambda/sky/clouds/service_catalog/__init__.py", line 136, in get_hourly_cost
    return _map_clouds_catalog(clouds, 'get_hourly_cost', instance_type, region,
  File "/home/ubuntu/sky-lambda/sky/clouds/service_catalog/__init__.py", line 43, in _map_clouds_catalog
    results.append(method(*args, **kwargs))
  File "/home/ubuntu/sky-lambda/sky/clouds/service_catalog/lambda_catalog.py", line 38, in get_hourly_cost
    assert not use_spot, 'Lambda Labs does not support spot.'
AssertionError: Lambda Labs does not support spot.
  1. Why do we disallow the multi-node cluster, as it seems we can launch multiple nodes in the console or with multiple sky launch?

@ewzeng
Copy link
Collaborator Author

ewzeng commented Dec 29, 2022

Thanks for the comments @Michaelvll!

  1. Wouldn't removing the non-sxm4 version from the catalog mean that SkyPilot users won't get to use them?

  2. Good catch. I will push a fix.

  3. I don't think it will be difficult to add multi-node support (apart from a potential tag file synchronization issue). I plan to do so in my next PR. (Do you think I should try to add it here?)

@Michaelvll
Copy link
Collaborator

Wouldn't removing the non-sxm4 version from the catalog mean that SkyPilot users won't get to use them?

Is a regular Lambda user able to use the non-sxm4 A100? I think it would be better that a Lambda Labs user can use SkyPilot for A100 out of box by specifying gpus: A100 or accelerators: A100, instead of having to specify the instance types as well.

I don't think it will be difficult to add multi-node support (apart from a potential tag file synchronization issue). I plan to do so in my next PR. (Do you think I should try to add it here?)

Sounds good! Let's do it in a future PR.

@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 1, 2023

Yes, a regular Lambda user should be able to use the non-sxm4 A100 (if there is availability).

@gmittal
Copy link
Collaborator

gmittal commented Jan 2, 2023

Thanks for picking this up @ewzeng! The progress is super exciting.

I just tried launching/tearing down a gpunode. The VM got spun up and removed successfully, however the autogenerated SSH key is still there in the dashboard. Shouldn't the SSH key that was generated in the Lambda console also be removed?

@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 2, 2023

Thanks for the review @gmittal!

I made the ssh key is per-user (not per-cluster), so we don't need to remove it from the console. (The ssh key is actually just ~/.ssh/sky-key.pub).

@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 3, 2023

@Michaelvll I pushed a fix for the sky launch --gpus A100 --use-spot error that you found.

@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 4, 2023

I reordered the Lambda catalog so that gpu_1x_a100_sxm4 comes before gpu_1x_a100. If you use this new catalog, then gpu_1x_a100_sxm4 should have priority over gpu_1x_a100.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks for the excellent work @ewzeng! Just did a quick pass (will do a more thorough one later). The --use-spot and --gpus A100 works well now.

Please remember to submit a PR for the catalog to this repo https://github.com/skypilot-org/skypilot-catalog, so that the user can automatically download the catalog from our repo.

I met the following issue:
sky launch -c test-lambda --gpus A100 echo hi fails

> sky launch -c test-lambda --gpus A100 echo hi
Task from command: echo hi
I 01-04 00:30:29 optimizer.py:606] == Optimizer ==
I 01-04 00:30:29 optimizer.py:617] Target: minimizing cost
I 01-04 00:30:29 optimizer.py:629] Estimated cost: $1.1 / hour
I 01-04 00:30:29 optimizer.py:629] 
I 01-04 00:30:29 optimizer.py:685] Considered resources (1 node):
I 01-04 00:30:29 optimizer.py:714] ------------------------------------------------------------------------
I 01-04 00:30:29 optimizer.py:714]  CLOUD    INSTANCE           vCPUs   ACCELERATORS   COST ($)   CHOSEN   
I 01-04 00:30:29 optimizer.py:714] ------------------------------------------------------------------------
I 01-04 00:30:29 optimizer.py:714]  Lambda   gpu_1x_a100_sxm4   30      A100:1         1.10          ✔     
I 01-04 00:30:29 optimizer.py:714]  GCP      a2-highgpu-1g      12      A100:1         3.67                
I 01-04 00:30:29 optimizer.py:714] ------------------------------------------------------------------------
I 01-04 00:30:29 optimizer.py:714] 
I 01-04 00:30:29 optimizer.py:729] Multiple Lambda instances satisfy A100:1. The cheapest Lambda(gpu_1x_a100_sxm4, {'A100': 1}) is considered among:
I 01-04 00:30:29 optimizer.py:729] ['gpu_1x_a100_sxm4', 'gpu_1x_a100'].
I 01-04 00:30:29 optimizer.py:729] 
I 01-04 00:30:29 optimizer.py:735] To list more details, run 'sky show-gpus A100'.
Launching a new cluster 'test-lambda'. Proceed? [Y/n]: 
Traceback (most recent call last):
  File "/Users/zhwu/miniconda3/envs/sky-dev/bin/sky", line 33, in <module>
    sys.exit(load_entry_point('skypilot', 'console_scripts', 'sky')())
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/skypilot-lambda/sky/utils/common_utils.py", line 214, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/skypilot-lambda/sky/cli.py", line 1009, in invoke
    return super().invoke(ctx)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/skypilot-lambda/sky/utils/common_utils.py", line 235, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/skypilot-lambda/sky/cli.py", line 1223, in launch
    _launch_with_confirm(
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/skypilot-lambda/sky/cli.py", line 717, in _launch_with_confirm
    if resource.cloud.is_same_cloud(sky.Lambda()):
AttributeError: 'NoneType' object has no attribute 'is_same_cloud'

sky/authentication.py Outdated Show resolved Hide resolved
sky/authentication.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
sky/clouds/lambda_labs.py Outdated Show resolved Hide resolved
sky/skylet/providers/lambda_labs/lambda_utils.py Outdated Show resolved Hide resolved
sky/skylet/providers/lambda_labs/lambda_utils.py Outdated Show resolved Hide resolved
sky/skylet/providers/lambda_labs/node_provider.py Outdated Show resolved Hide resolved
sky/skylet/providers/lambda_labs/node_provider.py Outdated Show resolved Hide resolved
tests/test_lambda.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 25, 2023

@Michaelvll I am once again asking for your review :)

@ewzeng ewzeng requested a review from Michaelvll January 25, 2023 06:49
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @ewzeng! The combined smoke_tests looks excellent. Left several comments, mostly for readability. : )

sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/optimizer.py Outdated Show resolved Hide resolved
sky/core.py Outdated Show resolved Hide resolved
sky/core.py Outdated Show resolved Hide resolved
sky/optimizer.py Outdated Show resolved Hide resolved
tests/test_smoke.py Outdated Show resolved Hide resolved
tests/test_smoke.py Outdated Show resolved Hide resolved
tests/test_smoke.py Show resolved Hide resolved
tests/test_smoke.py Show resolved Hide resolved
tests/test_smoke.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 27, 2023

Thanks for the comments @Michaelvll @concretevitamin! I tried to address them all.

Important updates:

  • I made requested_features to include multi-node and use enums.
  • I renamed Lambda Labs -> Lambda Cloud (the Lambda Labs people told us they prefer their cloud to be called Lambda / Lambda Cloud).

In particular, please rename ~/.lambda_labs to ~/.lambda_cloud.

Potentially unfinished:

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @ewzeng! Several final comments.

An issue:

  1. It seems autodown does not work for me: sky launch -c test-lambda -i 1 --down

sky/skylet/providers/lambda_cloud/lambda_utils.py Outdated Show resolved Hide resolved
sky/clouds/lambda_cloud.py Show resolved Hide resolved
sky/execution.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 28, 2023

Thanks for the review (once again) @Michaelvll!

Hmm, I ran autodown a few times and they worked each time. Are you sure you are launching on Lambda Cloud? Can you reproduce this bug?

I pushed some updates. There are two things I wasn't sure about, so I left the conversations unresolved (internal ip and task.num_nodes)

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the excellent PR for Lambda Labs @ewzeng! It looks good to me now. Just tried the autodown again and it works. It is probably because of some issues with my environment.

sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/clouds/lambda_cloud.py Show resolved Hide resolved
sky/clouds/lambda_cloud.py Show resolved Hide resolved
@ewzeng
Copy link
Collaborator Author

ewzeng commented Jan 30, 2023

Thanks for the review @Michaelvll (sorry for giving you so much work).

I just merged from master to make things work with commit 76eed62. Once I finish running the Lambda tests for a final time, I will squash & merge this pr.

@ewzeng ewzeng merged commit fc6e164 into skypilot-org:master Jan 30, 2023
sumanthgenz pushed a commit to sumanthgenz/skypilot that referenced this pull request Feb 22, 2023
* Apply gmittal's lambda lab PR (skypilot-org#1136) on top of commit ad37a47

* Basic working Lambda Labs support

* Add error handling for Lambda Labs API and small lambda-ray.yml bugfix

* Add automatic key generation, improve sky check, and resolve import bug

* Improve Lambda Labs launch code and error handling

* Remove bootstrap_config, change metadata file design, and resolve
provisioning bug

* Make autodown work on Lambda Labs

* Add basic tests and improve lambda-ray.yml.j2 bugfix

* Add sky cancel test and do not allow Lambda nodes to stop

* Polish provider code and change local metadata path to avoid clutter

* Update and move catalog out of repo

* Clean up code

* Cleanup and add CLI logs test

* Disallow --num-nodes > 1 and rename some variables

* Do not let optimizer consider Lambda Labs when launching spot

* Fix issues arising from merge

* Address Michaelvll comments

Nits, improve error handling for autostop and --num-nodes > 1,
regions_with_offering bugfix

* Address infwinston comments

Nits, lambda_keys format, improve error handling for autostop and
--use-spot

* Update Lambda Labs help string

* Move Lambda Lab tests into smoke tests and change local tag file
location

* Improve remote node detection

* Change tag file scheme

* Add comments and change region_zone lookup

* Use same tag file path for local and remote

* Remove is_remote file

* Clean up imports in Lambda Labs node_provider

* Make optimizer skip clouds that do not implement requested_features

* Rename Lambda Labs client functions, nits

* Improve requested_features implementation, nits

* Add type annotations, nits

* Improve pytest serialization logic

* Improve requested_features, introduce CloudImplementationFeatures enums

* Update lambda_utils.Metadata, address nits

* Fix conftest.py bug introduced in previous commit

* Update test comment

* Rename Lambda Labs -> Lambda Cloud

* Fix tag file reuse bug

* Testing nit

* Fix auth bug and address nits

* Address final nits

* Fix typing issues from merge

* Provide basic support for cpus in resource specification

* Improve 'cpu' resource specification for Lambda Cloud
@concretevitamin concretevitamin mentioned this pull request Mar 2, 2023
5 tasks
sumanthgenz pushed a commit to sumanthgenz/skypilot that referenced this pull request Mar 15, 2023
* Apply gmittal's lambda lab PR (skypilot-org#1136) on top of commit ad37a47

* Basic working Lambda Labs support

* Add error handling for Lambda Labs API and small lambda-ray.yml bugfix

* Add automatic key generation, improve sky check, and resolve import bug

* Improve Lambda Labs launch code and error handling

* Remove bootstrap_config, change metadata file design, and resolve
provisioning bug

* Make autodown work on Lambda Labs

* Add basic tests and improve lambda-ray.yml.j2 bugfix

* Add sky cancel test and do not allow Lambda nodes to stop

* Polish provider code and change local metadata path to avoid clutter

* Update and move catalog out of repo

* Clean up code

* Cleanup and add CLI logs test

* Disallow --num-nodes > 1 and rename some variables

* Do not let optimizer consider Lambda Labs when launching spot

* Fix issues arising from merge

* Address Michaelvll comments

Nits, improve error handling for autostop and --num-nodes > 1,
regions_with_offering bugfix

* Address infwinston comments

Nits, lambda_keys format, improve error handling for autostop and
--use-spot

* Update Lambda Labs help string

* Move Lambda Lab tests into smoke tests and change local tag file
location

* Improve remote node detection

* Change tag file scheme

* Add comments and change region_zone lookup

* Use same tag file path for local and remote

* Remove is_remote file

* Clean up imports in Lambda Labs node_provider

* Make optimizer skip clouds that do not implement requested_features

* Rename Lambda Labs client functions, nits

* Improve requested_features implementation, nits

* Add type annotations, nits

* Improve pytest serialization logic

* Improve requested_features, introduce CloudImplementationFeatures enums

* Update lambda_utils.Metadata, address nits

* Fix conftest.py bug introduced in previous commit

* Update test comment

* Rename Lambda Labs -> Lambda Cloud

* Fix tag file reuse bug

* Testing nit

* Fix auth bug and address nits

* Address final nits

* Fix typing issues from merge

* Provide basic support for cpus in resource specification

* Improve 'cpu' resource specification for Lambda Cloud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants