Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add HOST_CONTROLLERS check for clouds #3407

Merged
merged 3 commits into from
Apr 2, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

Adds CloudImplementationFeatures.HOST_CONTROLLERS to check if a cloud supports hosting controllers. cc #3377 #3363

Example output with spot.controller.resources.cloud: Kubernetes in config.yaml:

(base) ➜  sky-experiments git:(core_host_controllers) ✗ sky spot launch --cloud gcp -- echo hi
Task from command: echo hi
Managed spot job 'sky-cmd' will be launched on (estimated):
I 04-02 09:18:00 optimizer.py:690] == Optimizer ==
I 04-02 09:18:00 optimizer.py:701] Target: minimizing cost
I 04-02 09:18:00 optimizer.py:713] Estimated cost: $0.1 / hour
I 04-02 09:18:00 optimizer.py:713] 
I 04-02 09:18:00 optimizer.py:836] Considered resources (1 node):
I 04-02 09:18:00 optimizer.py:906] --------------------------------------------------------------------------------------------------
I 04-02 09:18:00 optimizer.py:906]  CLOUD   INSTANCE              vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 04-02 09:18:00 optimizer.py:906] --------------------------------------------------------------------------------------------------
I 04-02 09:18:00 optimizer.py:906]  GCP     n2-standard-8[Spot]   8       32        -              us-east4-a    0.09          ✔     
I 04-02 09:18:00 optimizer.py:906] --------------------------------------------------------------------------------------------------
I 04-02 09:18:00 optimizer.py:906] 
Launching the spot job 'sky-cmd'. Proceed? [Y/n]: Y
Launching managed spot job 'sky-cmd' from spot controller...
Launching spot controller...
I 04-02 09:18:05 optimizer.py:690] == Optimizer ==
I 04-02 09:18:05 optimizer.py:713] Estimated cost: $0.0 / hour
I 04-02 09:18:05 optimizer.py:713] 
I 04-02 09:18:05 optimizer.py:836] Considered resources (1 node):
I 04-02 09:18:05 optimizer.py:906] ----------------------------------------------------------------------------------------------
I 04-02 09:18:05 optimizer.py:906]  CLOUD        INSTANCE     vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 04-02 09:18:05 optimizer.py:906] ----------------------------------------------------------------------------------------------
I 04-02 09:18:05 optimizer.py:906]  Kubernetes   8CPU--24GB   8       24        -              kubernetes    0.00          ✔     
I 04-02 09:18:05 optimizer.py:906] ----------------------------------------------------------------------------------------------
I 04-02 09:18:05 optimizer.py:906] 
I 04-02 09:18:05 cloud_vm_ray_backend.py:4238] Creating a new cluster: 'sky-spot-controller-2ea485ea' [1x Kubernetes(8CPU--24GB, cpus=8+, mem=3x, disk_size=50)].
I 04-02 09:18:05 cloud_vm_ray_backend.py:4238] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
W 04-02 09:18:05 cloud_vm_ray_backend.py:2013] sky.exceptions.NotSupportedError: The following features are not supported by Kubernetes:
W 04-02 09:18:05 cloud_vm_ray_backend.py:2013]  Feature           Reason                                     
W 04-02 09:18:05 cloud_vm_ray_backend.py:2013]  host_controllers  Kubernetes cannot host controllers.        
W 04-02 09:18:05 cloud_vm_ray_backend.py:2013]  stop              Kubernetes does not support stopping VMs.  
W 04-02 09:18:05 cloud_vm_ray_backend.py:2039] 
W 04-02 09:18:05 cloud_vm_ray_backend.py:2039] Provision failed for 1x Kubernetes(8CPU--24GB, cpus=8+, mem=3x, disk_size=50) in kubernetes. Trying other locations (if any).
E 04-02 09:18:05 cloud_vm_ray_backend.py:2690] Failed to provision all possible launchable resources. Relax the task's resource requirements: 1x Kubernetes(cpus=8+, mem=3x, disk_size=50)
I 04-02 09:18:05 cloud_vm_ray_backend.py:2694] === Retry until up ===
I 04-02 09:18:05 cloud_vm_ray_backend.py:2694] Retrying provisioning after 32s (backoff with random jittering). Already tried 1 attempt.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Manual tests- sky spot launch --cloud gcp -- echo hi with kubernetes and aws controllers.

Comment on lines +67 to +69
clouds.CloudImplementationFeatures.HOST_CONTROLLERS: 'Kubernetes can '
'not host '
'controllers.',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed after #3377

@romilbhardwaj romilbhardwaj requested a review from cblmemo April 2, 2024 16:25
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for separating this @romilbhardwaj! LGTM.

sky/clouds/kubernetes.py Show resolved Hide resolved
@@ -64,6 +64,9 @@ class Kubernetes(clouds.Cloud):
'tiers are not '
'supported in '
'Kubernetes.',
clouds.CloudImplementationFeatures.HOST_CONTROLLERS: 'Kubernetes can '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc'ing @cblmemo to add clouds that should be black list for hosting the controllers.

@romilbhardwaj romilbhardwaj merged commit dd17f98 into master Apr 2, 2024
20 checks passed
@romilbhardwaj romilbhardwaj deleted the core_host_controllers branch April 2, 2024 21:06
@romilbhardwaj romilbhardwaj mentioned this pull request Apr 9, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants