-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set endpoint_url
from an env variable
#432
Comments
The fsspec configuration system allows you to encode such things into a JSON file. You can also use environment variables, but those must be string, and there is no current way to encode values that should live within a deeper level directory. I'd rather not add a whole load of extra kwargs to S3FileSystem, if possible. Can you suggest a nice way to make environment variables work for the us case you propose? |
Hey @martindurant, I think the issue here is that providing an environment variable override of the endpoint URL has been a common request to boto3 / aws for years, but for whatever reason they have never implemented such a feature: boto/boto3#2099 A lot of people, myself included, would like to be able to read/write to S3 compatible storage like MinIO without needing to plumb custom kwargs through the various function calls. The simplest way to achieve this is to support something like
I agree that this would make the most sense to implement in boto3, but as I said, it doesn't look like this is ever going to happen. So what do you think about doing this in s3fs instead? |
Have you tried using the configuration route? It would look like file: ~/.config/fsspec/s3.json (if on posix-like) {
"s3": {
"client_kwargs": {"endpoint_url": "..."}
}
} |
Thanks @martindurant, after upgrading to the latest I noticed there is also an environment variable approach as well through |
Yes, that's mentioned in the documentation too. However, each kwarg only can get a string value from the environment variable method. I this case, the value of One could imagine extending the environment variable to descend into dictionaries
This would need careful coding, considering that the user could want to override |
I see, thanks for clarifying. Happy to stick with the |
I think one of the strongest arguments to be able to pass the endpoint via environment variable is that inside workload orchestrators such as Additionally using an environment variable would allow users of libraries that use import pandas
import s3fs
s3 = s3fs.S3FileSystem(key='minioadmin', secret='minioadmin', client_kwargs={
'endpoint_url': 'http://localhost:9000'
})
df = pd.read_csv(s3.open("ml-data/test.csv", mode='rb')) to simply writing import pandas
df = pd.read_csv("s3://ml-data/test.csv") and letting the devops inject the proper credentials and endpoint via the environment |
We could make a special exception for really important arguments; but I would very much welcome any attempt to have fsspec's config system allow for nested data structures, such that you could pass "s3:client_kwargs:endpoint_url:..." (this as YAML) or have an environment variable S3__CLIENT_KWARGS__ENDPOINT_URL which does the same thing. |
do you want to allow for any client kwarg to be configured via environment variable? I can send a PR for injecting the endpoint right away and any client kwarg, or send them as separate PRs |
To be clear: you can already set any argument in any fsspec backend via configuration files or environment variables (where the latter must be strings, of course). The difference here, is that the value to be set is itself within a dict-type argument. https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration |
Created a feature request for fsspec that would support endpoint_url from environment variable, see fsspec/filesystem_spec#1130. |
This request is also addressed as of 2023.3.0 by #704. Just sharing, because I was about to try the (more flexible) solution from fsspec/filesystem_spec#1194, but happened across this when reading the source code to understand how exactly that worked. |
Many tools use s3fs as a backend for remote file handling (e.g. fsspec.) These tools may not always expose ways to pass in all the parameters to the underlaying s3fs instance.
It would be easier if things like the endpoint_url could optionally be read from the environment so a user could specify something like
S3_ENDPOINT_URL=$MYURL
which would get passed intoclient_kwargs={endpoint_url=$MYURL}
without needing code changes when using a custom url.Its likely other values in client_kwargs would also benefit form this. If client_kwargs could default to reading from some set of env var if not specified that would be a huge help to many teams which need to always specify them.
The text was updated successfully, but these errors were encountered: