-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object-store: Expose an async reader API for object store #4762
Comments
The challenge is this requires pre-fetching heuristics in order to perform well, which is one of the things I wanted to discourage when creating this crate as they are catastrophic for performance. There is further discussion of this on #1473. Ultimately object stores have very high latencies, on the orders of 100ms, they fundamentally are not filesystems... FWIW the arrow-rs file readers are specifically designed to work with the exposed abstractions and may serve as inspiration / work for your use-case |
Ok, I will check the discussion, not sure if they also apply to arrow2 async readers ? Actually I was trying to add certain things to polars that uses arrow2 and wanted to avoid arrow-rs crate specifically for this use case. |
I do still hope that polars will eventually switch away from arrow2 instead of reinventing the wheel, but I'm not sure what has happened with that effort. Last time @ritchie46 and I spoke it was primarily lack of time. I personally think the IO functionality is an ideal place to start any migration, as it is fairly self-contained. |
|
* Add ObjectStore BufReader (apache#4762) * Clippy * More Clippy * Fix MSRV * Fix doc
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While this FR is similar to #1803, this is not just specific to parquet or any format.
There is a lack of reader-like API for object_store.
Describe the solution you'd like
If you look at
put_multipart
API, it returns aBox<dyn AsyncWrite + Unpin + Send>
. I would suggest exposing a new APIget_reader
orget_aysnc_reader
which returns something likeBox<dyn AsyncRead + AsyncSeek + Send + Unpin>
.The signature(s) can look like,
For end users, they can use it like,
Describe alternatives you've considered
The alternative will require the users to implement such a reader by themselves.
Additional context
This can also help object_store work in a more versatile/ general manner. Example interop with the async parts of arrow2 like here
The text was updated successfully, but these errors were encountered: