-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ObjectStore with_url Should Handle Path #4199
Comments
I actually decided against this, in favor of returning the remaining path from parse_url in #4200 |
I stumbled over this today. While Being able to get the path out of an URL is quite messy, I'd rather not reimplement this on my own. Of course I could use I'd much prefer |
All from_env does is look in the environment for configuration keys, you could do likewise and pass what you found to parse_url |
But then what's the point of having the more composable cloud-specific builders in first place? The contain path parsing logic, being able to extract the leftover path is all that'd be needed to be able to use them for this usecase. |
Perhaps we could make https://github.com/apache/arrow-rs/blob/master/object_store%2Fsrc%2Fparse.rs#L71 public?
For the use-cases where stores aren't configured by a URL?? |
That'd help. We'd might still parse the URL twice, but that's less of a deal than constructing and throwing away the entire
Hmmh, I think in most applications you almost definitely want to have a combination of both. However, "how to get there" usually is very specific to the environment - a local developer on their local machine might have a static access key pair, or some AWS SSO config, while production workloads might use k8s and IAM roles for service accounts (env vars), or IAM roles for EC2 provided by the instance metadata server (implicit). IMHO, there's a lot of value in Leaving it up to the crate users to manually start parsing env vars into options passed to |
Whilst I agree that this is something people exchange, the challenge is when people then start creating object stores per path, instead of per bucket. This has been a frequent source of throughput issues people have run into, as connection pooling, credential caching, etc... are at the store level. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Various builders such as
AmazonS3Builder
,MicrosoftAzureBuilder
, etc.. provide awith_url
method.However, with exception to URL patterns such as
https://s3.region.amazonaws.com/bucket
which encode the bucket name in the URL, they ignore the path. This is surprising and inconsistent with stores such asHttpStore
andLocalFileSystem
which have a built-in notion of a prefix.This can to a certain extent be worked around with
PrefixStore
, but implementing this logic correctly requires duplicating the logic to understand what parts of a given URL are the prefixDescribe the solution you'd like
I would like the cloud stores to have a
with_prefix
option, and to populate this withinwith_url
.Describe alternatives you've considered
Additional context
Relates to #4047
delta-rs has some logic here to handle this, although this will misbehave for URLs where the bucket name is encoded in the path.
The text was updated successfully, but these errors were encountered: