-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IBM cloud provider integration #1598
Conversation
Thanks for your hard work @Cohen-J-Omer! Are there any instructions for us to try out this PR (e.g. create an account on IBM cloud, save IBM credentials to a certain file, etc.)? |
Hey @ewzeng,
|
@ewzeng great you assigned to this PR... Do you plan to test it with IBM Cloud account? Do you have one? The account need to have permissions to setup VPc.. |
@Cohen-J-Omer Thanks! I agree -- we can update the docs in a separate PR. @gilv Yes I was planning to make an IBM cloud account to test it out (I don't have one yet). Is there anything special I should do to get the necessary account permissions? Edit: it looks like the Sky Lab already has an IBM cloud account. I'll use that one once I get access; if I run into any account permission issues I'll report back. |
@ewzeng just regular account will do the trick... but if you use all kind of academic accounts or free trials, then I think VPC is blocked there |
@gilv Okay, got it. This is good to know. Thanks! |
@Cohen-J-Omer Here are some comments from trying out this pr! (I will go through the code soon)
Overall, awesome work! I tested |
Hey @ewzeng,
|
Thanks for the updates @Cohen-J-Omer! I will test this out. |
@Cohen-J-Omer I tried running
and waited for about 10 minutes, but the cluster was still up. Are you able to replicate this? |
@ewzeng thanks for noticing! |
I re-pulled! But now when I run
the instance still stays up (waited for ~10 minutes). Regarding your previous comments:
|
(1) I could add it until a script is made, since it will locate the resource group id and will suggest applying optional fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments! I'll take a break, and then I will review node_provider
and vpc_provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments on node_provider
and vpc_provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment on hybrid ips
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another pass through node_provider
and added a few more comments!
Hey @ewzeng, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your changes look good. I added one comment on race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Cohen-J-Omer for the incredible work, and @ewzeng for reviewing this PR! It looks pretty nice to me and should be almost ready to go in. Left several comments, most of which are nits. Once the master branch is merged and the tests are passed, the PR should be ready to go in.
Thanks again for the awesome work! Looking forward to seeing IBM cloud supported by SkyPilot!
sky/clouds/ibm.py
Outdated
_regions: List[clouds.Region] = [] | ||
|
||
@classmethod | ||
def regions(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed, as all the requests will go to the catalog, instead of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.k, good to know. Then i believe you should get rid of this method , right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think the right way is to be opposite. We should get rid of the method this comment quoted. Instead, using the method you linked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it eventually tries to call a regions
method on the CLOUD_PROIVDER_catalog.py
module, due to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does not? It only generates the regions you hardcoded here. I am saying this def regions(cls)
is not called anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the part where the regions()
class method on clouds/ibm.py
have become obsolete and should be removed, along with the class method _get_default_region()
i've created.
Unrelated, I think this method should also be removed, since _map_clouds_catalog(clouds, 'regions')
tries to invoke a function that i haven't seen implemented on other providers' catalogs, e.g. there's no regions()
method on aws_catalog.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, it is used in the following place, but we can definitely make it more general in a different PR.
https://github.com/concretevitamin/sky-experiments/blob/508be85e4a2a0df1fa98f7fac63980ff6e24da3e/sky/backends/cloud_vm_ray_backend.py#L840
sky/clouds/ibm.py
Outdated
|
||
@classmethod | ||
def _get_default_region(cls) -> clouds.Region: | ||
region_name = get_cred_file_field('region', 'us_south') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be able to be removed, as it will never be reached, due to the comment for if region is None
above.
Hey @Michaelvll,
If the PR is in a satisfactory condition I will rebase it (or merge if the process will prove too difficult due to numerous conflicts). |
Hey @Michaelvll, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the excellent effort to support IBM cloud in SkyPilot @Cohen-J-Omer! The PR now looks good to me. Once we fix the CI errors, we are good to go!
Tested:
-
pytest tests/test_smoke.py --ibm
TODOs for future PRs (please file issues for these):
- Add instructions for setting up the IBM credentials in the doc.
- Add catalog fetcher for IBM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: when terminating with sky down
or checking the status with sky status -r
, there will be a bunch of logger info shows up corrupting the progress bar, can it be changed to DEBUG
or somehow work well with the progress bar?
Another issue:
When I am trying to sky down
the VM, it shows the following error, and it went away when I tried again. Can we add a retry for this error?
File "/home/gcpuser/sky-ibm/sky/backends/cloud_vm_ray_backend.py", line 3110, in teardown_no_lock
vpc_provider.delete_vpc(vpc_id, region)
File "/home/gcpuser/sky-ibm/sky/skylet/providers/ibm/vpc_provider.py", line 283, in delete_vpc
self.delete_subnets(tmp_vpc_client, vpc_data, region)
File "/home/gcpuser/sky-ibm/sky/skylet/providers/ibm/vpc_provider.py", line 352, in delete_subnets
vpc_client.delete_subnet(subnet_id).get_result()
File "/opt/conda/envs/sky-ibm/lib/python3.10/site-packages/ibm_vpc/vpc_v1.py", line 2081, in delete_subnet
response = self.send(request, **kwargs)
File "/opt/conda/envs/sky-ibm/lib/python3.10/site-packages/ibm_cloud_sdk_core/base_service.py", line 313, in send
response = self.http_client.request(**request, cookies=self.jar, **kwargs)
File "/opt/conda/envs/sky-ibm/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/envs/sky-ibm/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/envs/sky-ibm/lib/python3.10/site-packages/requests/adapters.py", line 578, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='us-east.iaas.cloud.ibm.com', port=443): Read timed out. (read timeout=60)
Hey @Michaelvll,
|
Thanks @Cohen-J-Omer!
It is ok we don't do the retry for now. We can leave it for a future PR.
This is the file we describe the installation of cloud related stuffs.
https://github.com/skypilot-org/skypilot/blob/master/docs/source/getting-started/installation.rst
It will show up in our doc website, so it would be easier for the new user to find it.
https://skypilot.readthedocs.io/en/latest/getting-started/installation.html
…________________________________
From: Omer Joshua Cohen ***@***.***>
Sent: Wednesday, April 26, 2023 8:56:07 AM
To: skypilot-org/skypilot ***@***.***>
Cc: Zhanghao Wu ***@***.***>; Mention ***@***.***>
Subject: Re: [skypilot-org/skypilot] IBM cloud provider integration (PR #1598)
Hey @Michaelvll<https://github.com/Michaelvll>,
Thanks for the thorough review.
* Messages will no longer be displayed when deleting an IBM cluster.
* Regarding the timeout error: I cannot add a retry for it, as:
* It rarely happens (yet to see it personally).
* it's an error that implies that either IBMs servers were down, or the user experienced a connectivity issue.
* it's caused by an internal REST API request shared between all methods, invoked by any IBM client used in this project, thus i'll have to wrap quite a lot.
* With regard to the github issues:
* No problem, i'll open the issues.
* sky check will describe to users how and where they should create the credentials file, if any issue arises (e.g. credentials file missing, or some of the mandatory fields are missing). On which documentation file would you like me to add said instructions?
—
Reply to this email directly, view it on GitHub<#1598 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABTQXJLEJL6KHS2V3RQYLVDXDFARPANCNFSM6AAAAAAT4ZZZHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
PR Topic - integration of IBM as a new cloud provider.
Successfully ran the following tests:
Notes:
I plan to:
~/.ibm
, similarly toaws configure
.Setup:
Until the above mentioned script will be created, authenticate your IBM account by adding the following data in
~/.ibm/credentials.yaml
:For the complete guide visit this link.