Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set endpoint_url from an env variable #432

Closed
f4hy opened this issue Feb 26, 2021 · 12 comments
Closed

Set endpoint_url from an env variable #432

f4hy opened this issue Feb 26, 2021 · 12 comments

Comments

@f4hy
Copy link

f4hy commented Feb 26, 2021

Many tools use s3fs as a backend for remote file handling (e.g. fsspec.) These tools may not always expose ways to pass in all the parameters to the underlaying s3fs instance.

It would be easier if things like the endpoint_url could optionally be read from the environment so a user could specify something like S3_ENDPOINT_URL=$MYURL which would get passed into client_kwargs={endpoint_url=$MYURL} without needing code changes when using a custom url.

Its likely other values in client_kwargs would also benefit form this. If client_kwargs could default to reading from some set of env var if not specified that would be a huge help to many teams which need to always specify them.

@martindurant
Copy link
Member

martindurant commented Feb 26, 2021

The fsspec configuration system allows you to encode such things into a JSON file. You can also use environment variables, but those must be string, and there is no current way to encode values that should live within a deeper level directory. I'd rather not add a whole load of extra kwargs to S3FileSystem, if possible. Can you suggest a nice way to make environment variables work for the us case you propose?

@tgaddair
Copy link

Hey @martindurant, I think the issue here is that providing an environment variable override of the endpoint URL has been a common request to boto3 / aws for years, but for whatever reason they have never implemented such a feature:

boto/boto3#2099
boto/boto3#1375
aws/aws-cli#1270

A lot of people, myself included, would like to be able to read/write to S3 compatible storage like MinIO without needing to plumb custom kwargs through the various function calls. The simplest way to achieve this is to support something like AWS_ENDPOINT_URL . It seems like this could be inserted into s3fs here without a lot of hassle like so:

init_kwargs = dict(
    endpoint_url=os.environ.get('AWS_ENDPOINT_URL'),
    aws_access_key_id=self.key,
    aws_secret_access_key=self.secret,
    aws_session_token=self.token,
)

I agree that this would make the most sense to implement in boto3, but as I said, it doesn't look like this is ever going to happen. So what do you think about doing this in s3fs instead?

@martindurant
Copy link
Member

Have you tried using the configuration route? It would look like

file: ~/.config/fsspec/s3.json (if on posix-like)

{
  "s3": {
    "client_kwargs": {"endpoint_url": "..."}
  }
}

@tgaddair
Copy link

Thanks @martindurant, after upgrading to the latest fsspec, got this to work!

I noticed there is also an environment variable approach as well through fsspec here, which achieves something similar to what this issue proposes.

@martindurant
Copy link
Member

martindurant commented Mar 18, 2021

Yes, that's mentioned in the documentation too. However, each kwarg only can get a string value from the environment variable method. I this case, the value of client_kwargs is a dictionary, and we want the inner value set, which doesn't work with the current implementation.

One could imagine extending the environment variable to descend into dictionaries

S3_PROFILE=myprofile -> filesystems("s3", profile="myprofile")  # works now
S3_CLIENT_KWARGS__ENDPOINT_URL="https://..." -> filesystems("s3", client_kwargs={"endpoint_url": "http://...")

This would need careful coding, considering that the user could want to override client_kwargs or any sub-part. In fact that's an issue now - try supplying client_kwargs with any other keys than "endpoint_url", and you will find that the defaults are gone.

@tgaddair
Copy link

I see, thanks for clarifying. Happy to stick with the conf.json approach then.

@dvaldivia
Copy link

I think one of the strongest arguments to be able to pass the endpoint via environment variable is that inside workload orchestrators such as kubernetes, it's desirable to be able to specify the S3 endpoint via environment variable as opposed as to a file or explicitly in code.

Additionally using an environment variable would allow users of libraries that use s3fs such as pandas to go from this

import pandas
import s3fs
s3 = s3fs.S3FileSystem(key='minioadmin', secret='minioadmin', client_kwargs={
    'endpoint_url': 'http://localhost:9000'
})
df = pd.read_csv(s3.open("ml-data/test.csv", mode='rb'))

to simply writing

import pandas
df = pd.read_csv("s3://ml-data/test.csv")

and letting the devops inject the proper credentials and endpoint via the environment

@martindurant
Copy link
Member

We could make a special exception for really important arguments; but I would very much welcome any attempt to have fsspec's config system allow for nested data structures, such that you could pass "s3:client_kwargs:endpoint_url:..." (this as YAML) or have an environment variable S3__CLIENT_KWARGS__ENDPOINT_URL which does the same thing.

@dvaldivia
Copy link

do you want to allow for any client kwarg to be configured via environment variable? I can send a PR for injecting the endpoint right away and any client kwarg, or send them as separate PRs

@martindurant
Copy link
Member

To be clear: you can already set any argument in any fsspec backend via configuration files or environment variables (where the latter must be strings, of course). The difference here, is that the value to be set is itself within a dict-type argument.

https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration

@mauvilsa
Copy link

mauvilsa commented Dec 5, 2022

Created a feature request for fsspec that would support endpoint_url from environment variable, see fsspec/filesystem_spec#1130.

@deepyaman
Copy link

This request is also addressed as of 2023.3.0 by #704. Just sharing, because I was about to try the (more flexible) solution from fsspec/filesystem_spec#1194, but happened across this when reading the source code to understand how exactly that worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants