Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core]Read working_dir zip from private s3 #34708

Open
loleek opened this issue Apr 24, 2023 · 8 comments
Open

[Core]Read working_dir zip from private s3 #34708

loleek opened this issue Apr 24, 2023 · 8 comments
Assignees
Labels
core Issues that should be addressed in Ray Core core-runtime-env Issues related to Ray environment dependencies enhancement Request for new feature and/or capability good first issue Great starter issue for someone just starting to contribute to Ray P1 Issue that should be fixed within a few weeks

Comments

@loleek
Copy link

loleek commented Apr 24, 2023

Description

Currently, boto3 cannot inject endpoint_url from ~/.aws/config or env_vars.
So I cannot submit a job which located at my private s3-like storage using s3-prefix working_dir.
I see the packing.py create boto3.client very simple:

 if protocol == Protocol.S3:
       try:
           import boto3
           from smart_open import open as open_file
       except ImportError:
           raise ImportError(
               "You must `pip install smart_open` and "
               "`pip install boto3` to fetch URIs in s3 "
               "bucket. " + install_warning
           )
       tp = {"client": boto3.client("s3")}

Can ray expose kwags here so I can inject endpoint_url, access_key_id, access_secret or other configurations?

Use case

No response

@loleek loleek added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 24, 2023
@loleek loleek changed the title [<Ray component: Core>]Read working_dir zip from private s3 [Core]Read working_dir zip from private s3 Apr 24, 2023
@rkooo567 rkooo567 added the core Issues that should be addressed in Ray Core label Apr 25, 2023
@rkooo567
Copy link
Contributor

Hmm this seems to be a pretty reasonable request. cc @architkulkarni for thoughts?

@rkooo567 rkooo567 added the good first issue Great starter issue for someone just starting to contribute to Ray label Apr 25, 2023
@architkulkarni
Copy link
Contributor

I agree, it seems reasonable. It may take some care to get the API right.

@rkooo567 rkooo567 added P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 10, 2023
@rkooo567
Copy link
Contributor

This is as similar issue as #31122.

@rkooo567 rkooo567 added P1 Issue that should be fixed within a few weeks and removed P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared labels May 10, 2023
@jjyao jjyao added the core-runtime-env Issues related to Ray environment dependencies label Sep 25, 2023
@ketangangal
Copy link

ketangangal commented Oct 2, 2023

Hello @rkooo567 & @architkulkarni

I can work on this issue !

@ketangangal
Copy link

Requirements :

  1. if From Kwargs endpoint_url, access_key_id, access_secret or other configurations?
  2. elif From ENV endpoint_url, access_key_id, access_secret or other configurations?
  3. else. default boto3

@rkooo567
Copy link
Contributor

rkooo567 commented Oct 3, 2023

that sounds great! @rynewang can shepherd the contribution!

@rynewang
Copy link
Contributor

rynewang commented Oct 5, 2023

The same mechanism is used by java_jars, py_modules and working_dir. We can find a common way to model these.

We can either type the working_dir value to (pseudocode) str | Dict["uri" | **kwargs, str], accepting

  • runtime_env={"working_dir":{"uri":"s3://my_private_bucket/dir", "aws_access_key_id":"my_key", "aws_secret_access_key":"my_secret"}}
  • runtime_env={"working_dir":"s3://public_bucket/dir"}

Or we can add a special field in runtime_env _runtime_env_context like

  • runtime_env={"working_dir":"s3://my_private_bucket/dir", "_runtime_env_context":{"aws_access_key_id":"my_key", "aws_secret_access_key":"my_secret"}}

and let different runtime env plugins to read them.

I personally prefer the former one, but I'd like to hear from you. cc @jjyao @rkooo567

@architkulkarni
Copy link
Contributor

I believe boto3 now supports specifying the endpoint URL via environment variable (which was the initial reason for creating this issue). See boto/boto3#2099 (comment)

If we can verify that this workflow can succeed by setting the relevant variables in the runtime_env "env_vars" field, then perhaps we don't even need to add a new API here. Instead we can just add a quick section for this workflow in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core core-runtime-env Issues related to Ray environment dependencies enhancement Request for new feature and/or capability good first issue Great starter issue for someone just starting to contribute to Ray P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

6 participants