-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object_store: Instantiate object store from provided url with store options #4047
Comments
@houqp what are your thoughts on this ? |
There was some prior discussion on #2304 FYI @roeap I would not object to providing a
I'm not sure what the advantage of providing |
This is just an example of "a object store". As the psuedo code I added to OP was taken from delta where they return |
From my perspective, centralising this logic makes a lot of sense. We actually have some preparatory work already done within object store. Specifically, extracting relevant information from the URLs is available on the builders via respective One thing we discussed back then was the option to go as far as moving the There is one more kind of related thing I recently looked into, but haven't converged on a good solution yet. Essentially using My current thinking was to check storage options if they contain a full credential (this would have to be build) and if not, then initialise from the environment. In azure the default credential allows for explicitly configuring to omit certain options as an additional guard against picking the wrong credential, which may be an option as well. All in all though I think this is a gerat addition to object store. Update: One more consideration would be to not return an |
We could return
Perhaps we don't call from_env and punt this to the user to decide, I don't have a good solution for this... |
doesn't 'from_env' imply that most creds, options will come from env and that env vars should be preferred? I think an |
It might :) - but from what I see there is no way to tell the user intent inside the proposed I guess thats why it might just be best to just let the caller figure that out, to keep things simpler for now. If we see a solution emerging in several protects, we can still move that upstream? |
or wait, my bad i didn't see the from_env in the snippet. for this PR, i would like to keep it very simple - Given a URL and storage options, a user should be able to instantiate an object store. so this PR can be separated from the registry part and the env part - nothing is to be read from the env for the registries - I think it should still be a part of the projects, because they all may want to handle this their own way. |
@roeap @tustvold I found some time today to work on this part. A prototype of how it looks - master...chitralverma:arrow-rs:parse_url Please let me know what are your initial thoughts, based on it I can open a PR. |
From my end this looks like a good starting point. On first glance we may have toto put a bit more effort into handling "https" schemes, since at least for azure and aws we can also parse the respective http based urls. Also just realized, since a recent fix to allow "." in bucket names, we may right now not properly recognize aws urls in the virtual-hosted style, as we split on a static number of "." to dissect the url. |
Yes, this I have already done, not pushed yet.
Can you please point me to some existing MR or issue for this, I will cross check the behaviours. |
turns out i was wrong, while |
alright, so I'll raise a PR now and we can do formal review there |
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, in the projects that are using
object_store
- datafusion/ delta-rs/ pola-rs etc, adyn ObjectStore
has to be created manually by parsing the provided URL, checking the scheme and providing the options.It would be great to have this capability directly provided by the crate.
Describe the solution you'd like
My proposal is to standardize this implementation and bring it into this crate itself exposed by a simple function like the below as it would make things significantly simple for developers using the crate.
For any new storage backends that may come up in the future, they can be added to the
ObjectStoreKind
along with a small implementation ininto_impl
. Users of the crate will only have to bump up the crate version.Describe alternatives you've considered
Without this, each lib using
object_store
has to implement its own parsing.Examples:
Also without this each time this crate adds a new backend, the users of this crate will have to bump up the version and add implementation for the backends by themselves.
Additional context
This idea is also implemented by,
The text was updated successfully, but these errors were encountered: