Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ObjectStoreScheme (#4047) #4184

Closed
wants to merge 2 commits into from

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented May 9, 2023

Which issue does this PR close?

Closes #4047

Rationale for this change

Inspired by the work in #4077 I started work on a generic ObjectStoreBuilder, however, this ran into a couple of challenges

  • Different stores have different configuration APIs, e.g. S3 has ClientOptions, whereas LocalFilesystem has a prefix
  • Some downstreams will want to use environment variables, and some will not
  • Some downstreams will want to filter config to just match those expected by the store, others will want to error
  • There are various valid ways LocalFileSystem and Memory URLs could be interpreted
  • The feature flag plumbing was extremely fiddly, the fact we separate the different stores is a historical artifact from when they brought in different SDKs, downstreams shouldn't need to do this

Whilst this PR does to a certain extent punt the issue onto downstreams, I think this standardises the non-trivial URL recognition logic, whilst allowing downstreams to make their own opinionated decision w.r.t the above.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label May 9, 2023
@tustvold tustvold force-pushed the object-store-scheme branch from 1a6e3d8 to 28e7ccd Compare May 9, 2023 14:29
/// This can be combined with the [with_url](crate::aws::AmazonS3Builder::with_url) methods
/// on the corresponding builder to construct the relevant type of store
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub enum ObjectStoreScheme {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why you are proposing to add this to the object store crate.

Users of object_store would still have to match on the resulting scheme and instantiate a builder / configuration appropriate to whatever they wanted. The extra value to having a hard coded list of url prefixes seems relatively minimal.

Maybe this is just a first step.

If I were a user I would want something that took a url like s3://foo-bucket or https://andrew:[email protected]/path and returned an Arc<dyn ObjectStore> .

For convenience the object_store crate could have default interpretations of these urls, but also some way to extend the API;

Basically I think the API here makes a lot of sense https://docs.rs/datafusion/latest/datafusion/datasource/object_store/trait.ObjectStoreRegistry.html

Copy link
Contributor Author

@tustvold tustvold May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API here makes a lot of sense

This API is just a trait? It doesn't contain any parsing logic, nor any logic to interpret schemes directly

The extra value to having a hard coded list of url prefixes seems relatively minimal

It isn't just schemes FWIW

why you are proposing to add this to the object store crate
If I were a user I would want something that took a url like s3://foo-bucket or https://andrew:[email protected]/path and returned an Arc

As explained in the description, because there isn't a way I can see to make such an API that is both coherent and not extremely opinionated...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is just a trait? It doesn't contain any parsing logic, nor any logic to interpret schemes directly

The API is a trait but DataFusion provides default parsing logic / scheme interpretation in https://docs.rs/datafusion/latest/datafusion/datasource/object_store/struct.DefaultObjectStoreRegistry.html

As explained in the description, because there isn't a way I can see to make such an API that is both coherent and not extremely opinionated...

I wast trying to suggest an API that let users implement their own opinions while also providing a default implementation that worked for simple cases (with whatever opinions you wanted)

@tustvold tustvold closed this May 9, 2023
@tustvold tustvold reopened this May 9, 2023
@tustvold tustvold marked this pull request as draft May 9, 2023 17:02
@tustvold
Copy link
Contributor Author

tustvold commented May 9, 2023

I will add a build_with_options or simliar method to this

@tustvold tustvold closed this May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

object_store: Instantiate object store from provided url with store options
2 participants