Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object-store: Expose an async reader API for object store #4762

Closed
chitralverma opened this issue Sep 1, 2023 · 5 comments · Fixed by #4857
Closed

object-store: Expose an async reader API for object store #4762

chitralverma opened this issue Sep 1, 2023 · 5 comments · Fixed by #4857
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface

Comments

@chitralverma
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While this FR is similar to #1803, this is not just specific to parquet or any format.

There is a lack of reader-like API for object_store.

Describe the solution you'd like
If you look at put_multipart API, it returns a Box<dyn AsyncWrite + Unpin + Send>. I would suggest exposing a new API get_reader or get_aysnc_reader which returns something like Box<dyn AsyncRead + AsyncSeek + Send + Unpin>.

The signature(s) can look like,

async fn get_reader(
        &self,
        location: &Path,
    ) -> Result<Box<dyn AsyncRead + AsyncSeek + Send + Unpin>>;

async fn get_reader(
        &self,
        location: &ObjectMeta,
    ) -> Result<Box<dyn AsyncRead + AsyncSeek + Send + Unpin>>;

For end users, they can use it like,

let location: Path = "..."; // can be overloaded for ObjectMeta as well
let reader = store.get_reader(&location);

Describe alternatives you've considered
The alternative will require the users to implement such a reader by themselves.

Additional context
This can also help object_store work in a more versatile/ general manner. Example interop with the async parts of arrow2 like here

@chitralverma chitralverma added the enhancement Any new improvement worthy of a entry in the changelog label Sep 1, 2023
@chitralverma
Copy link
Author

@tustvold @alamb @roeap any suggestions on this ?

@chitralverma chitralverma changed the title object_store: Expose an async reader API for object store object-store: Expose an async reader API for object store Sep 1, 2023
@tustvold
Copy link
Contributor

tustvold commented Sep 1, 2023

The challenge is this requires pre-fetching heuristics in order to perform well, which is one of the things I wanted to discourage when creating this crate as they are catastrophic for performance. There is further discussion of this on #1473. Ultimately object stores have very high latencies, on the orders of 100ms, they fundamentally are not filesystems...

FWIW the arrow-rs file readers are specifically designed to work with the exposed abstractions and may serve as inspiration / work for your use-case

@chitralverma
Copy link
Author

chitralverma commented Sep 1, 2023

Ok, I will check the discussion, not sure if they also apply to arrow2 async readers ?

Actually I was trying to add certain things to polars that uses arrow2 and wanted to avoid arrow-rs crate specifically for this use case.

@tustvold
Copy link
Contributor

tustvold commented Sep 1, 2023

I do still hope that polars will eventually switch away from arrow2 instead of reinventing the wheel, but I'm not sure what has happened with that effort. Last time @ritchie46 and I spoke it was primarily lack of time.

I personally think the IO functionality is an ideal place to start any migration, as it is fairly self-contained.

tustvold added a commit to tustvold/arrow-rs that referenced this issue Sep 25, 2023
tustvold added a commit that referenced this issue Sep 25, 2023
* Add ObjectStore BufReader (#4762)

* Clippy

* More Clippy

* Fix MSRV

* Fix doc
@tustvold tustvold added the object-store Object Store Interface label Oct 18, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'object-store'} from #4857

ryanaston pushed a commit to segmentio/arrow-rs that referenced this issue Nov 6, 2023
* Add ObjectStore BufReader (apache#4762)

* Clippy

* More Clippy

* Fix MSRV

* Fix doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants