-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement _get_kwargs_from_urls #273
Comments
What kind of URL do you have in mind, and what information would you get from it? |
Using query arguments, something like |
Do such URL patterns exist elsewhere? It seems to me like a bit of an overuse of the one string, especially when many parameters are not themselves expressed as strings. |
Maybe I'm assuming too much for the intended use of I can implement this on top of fsspec somehow if fsspec's URLs are not meant to be used this way. It's just a shame to have all those different filesystems supported by the same API unified, and still have to switch-case on the ones I want to support to pass it the right options. |
Then kind of thing we were targeting comes from SSH, also used by hadoop world, like "hdfs://user:pw@host:port/path". I suppose it could be extended to query parameters where we know that the path itself doesn't contain a query (as opposed, for example, to real HTTP URLs). Let's leave this open a little while and see if the functionality would be useful to anyone else. |
I think this is a great idea. I bumped into this thread while trying to find out if I can specify the Moreover, I found a recent PR in pandas implementing this via environment variables (pandas-dev/pandas#29050) and it was closed by the panda devs arguing that this should be implemented in I think making these options configurable via the URL string would be the perfect solution. |
Note that following pandas-dev/pandas#34266 , it should become possible to pass parameters to the filesystem backend sometime in the future. |
The above happened, so this is less of a priority. Please reopen if you think it important. |
I'd vote for any of a with-context, environment variable or URL parsing to be able to pass s3 strings down to clients that I don't have control of. Call graph:
and the intermediate two layers would currently (and don't) adhere to the same "storage_options" behavior. |
You can control the "default" s3fs instance with configuration, in files or environment variables https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration |
Thanks, @martindurant. For what I'm currently attempting (pytest-driven benchmarking) it'll be fairly simple to mock out |
You can also edit the config dictionary directly, if you wish. As with the conversation above, I have trouble imagining how to encode all the things you might want to into a single URL. |
I imagine AWS did as well hence the situation we're in. (OR they actively wanted to make it difficult to use custom endpoints) Take home: it'd be valuable but is obviously a prickly issue. Of course, if fsspec did find a way, others might follow suit. ;) |
For anyone who tries to go down the environment variable route, see also #432 and stick to the file method.
but
👍 |
Hm, right, https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration only mentions that INI values are limited to strings, I suppose I thought that was obvious for env-vars. We could, of course, do a |
Using URLs is very convenient, it allows configuring the storage via a single environment variable.
However S3FileSystem does not implement
_get_kwargs_from_urls
so it is impossible to specify anything via the URL, only the bucket.It would be good to have S3FileSystem parse query arguments, such as
endpoint_url
(useful for testing locally with Minio for instance),region_name
,requester_pays
,signature_version
, and timeouts.Right now I have to do my own parsing on the URL to pass it to S3FileSystem, which duplicates the effort in fsspec.
The text was updated successfully, but these errors were encountered: