Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(object_store): Instantiate object store from provided url #4077

Closed
wants to merge 26 commits into from

Conversation

chitralverma
Copy link

@chitralverma chitralverma commented Apr 13, 2023

Which issue does this PR close?

Closes #4047.
Closes #2304

Rationale for this change

This PR proposes a standardized implementation to create an object store from provided URL and options. It would make things significantly simple for developers using the crate.

What changes are included in this PR?

Check list

  • Add a parse_url( ... )
  • Add a from_env to parse_url
  • Tests for parse_url( ... )
    • Local FS
    • S3
    • Azure
    • GCS
    • HTTP/s
  • Examples
  • Documentation
  • Update with main

Are there any user-facing changes?

Yes, parse_url( ... ) is user-facing.
No breaking changes.

@github-actions github-actions bot added the object-store Object Store Interface label Apr 13, 2023
@chitralverma chitralverma changed the title parse_url prototype (object_store): Instantiate object store from provided url Apr 13, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, mostly just some comments on error handling

object_store/src/lib.rs Outdated Show resolved Hide resolved
object_store/src/lib.rs Outdated Show resolved Hide resolved
object_store/src/lib.rs Show resolved Hide resolved
object_store/src/lib.rs Outdated Show resolved Hide resolved
@chitralverma
Copy link
Author

@tustvold one question I had regarding the tests - I want to test parse_url for each supported store. so shall I put them separately in each module or create 1 big test in the same file?

@tustvold
Copy link
Contributor

so shall I put them separately in each module

I suspect this will make the feature flags easier

tustvold
tustvold previously approved these changes Apr 15, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice to me, perhaps @roeap may want to give it a look over

object_store/src/options.rs Outdated Show resolved Hide resolved
object_store/src/options.rs Outdated Show resolved Hide resolved
/// # Examples
///
/// ```
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant example 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a WIP 😂

there are some open items in my checklist

object_store/src/options.rs Outdated Show resolved Hide resolved
.iter()
.map(|(key, value)| {
let conf_key =
AzureConfigKey::from_str(&key.to_ascii_lowercase()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we either return an error or omit the keys that will panic here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roeap this is how it was done before, but based on @tustvold's comments I changed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant making the function fallible and return the error, rather then unwrapping and thus panicing ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roeap Thanks for the review.
I'm not sure what exactly you meant here, but based on my understanding and a little searching, I found something called error propagation and have made some changes for this.

Can you please check if this is what you meant and if this is on the right track?

@tustvold tustvold marked this pull request as draft April 26, 2023 10:49
@tustvold
Copy link
Contributor

I've marked this as a draft as it doesn't appear to be ready for review

@tustvold tustvold dismissed their stale review April 26, 2023 10:50

Incomplete

@chitralverma
Copy link
Author

chitralverma commented Apr 27, 2023

@tustvold @roeap The implementation and the test cases are ready for review.
If there arent any other changes then I will add examples and docs and then mark this PR as ready for final review

@chitralverma chitralverma requested review from roeap and tustvold April 27, 2023 20:41
@chitralverma
Copy link
Author

also one more question - Since each object store has its own parse_url already, do you think we should have a different name for the functionality in this PR ?

pub fn parse_url(
url: impl AsRef<str>,
store_options: Option<impl Into<StoreOptions>>,
_from_env: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could instead use a builder pattern here 🤔 I'll have a play over the weekend and see what I can come up with

Copy link
Author

@chitralverma chitralverma Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I thought about that as well. let me do something about this and then we can merge our ideas on this.

goals -

  • allow URL based obejct_store instantiation
  • store options are either explicitly passed or picked from env or both (preference given to explicit over env)
  • optionally allow internal ClientOptions to be passed
  • should work for all object stores like local, HTTP, AWS, GCS, Azure, mem
  • user facing API should be simple and natural

Copy link
Author

@chitralverma chitralverma May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a builder pattern for this.
let me know if you had something like this in mind?

https://github.com/apache/arrow-rs/pull/4077/files#r1182195434

/// .with_env_variables(true)
/// .build();
/// ```
pub struct ObjectStoreBuilder {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold Based on your comment I have created a builder pattern for object store.

If this looks good, the we can get rid of the options.rs I add as it contains duplicate code.

Also added better examples :)

@chitralverma chitralverma requested a review from tustvold May 2, 2023 08:30
@tustvold
Copy link
Contributor

tustvold commented May 2, 2023

Apologies I have a bit of an interrupted week, and want to give some thought into how this would integrate with things like DataFusion's ObjectStoreRegistry or downstream ObjectStore implementations like HDFS. Thank you for sticking with this, I'll get back to you as soon as I am able

@tustvold
Copy link
Contributor

tustvold commented May 9, 2023

Thank you for your work on this, I've raised #4184 with a proposal inspired by this. PTAL and let me know what you think

@tustvold
Copy link
Contributor

Closing this as #4200 has now been merged, thank you for your work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
3 participants