-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit Design of ObjectStore::put_multipart #5458
Comments
One downside of moving to a multipart upload API is it would force unnecessary buffering in cases where the underlying store lacks a minimum part size, e.g. LocalFilesystem. More thought is needed 🤔 |
Maybe we could do something like The idea of implementing MultiPartStore for LocalFileSystem makes a lot of sense to me (it could be implemented very efficiently, as you point out). Tuning write buffer sizes (aka part sizes in a multi-part upload) is likely to be object store and system specific, so the buffering doesn't seem like a fundamental problem to me. |
So the major challenge with providing a multipart API for LocalFilesystem is that there is no obvious way to transport the part size in use. This presents a problem for determining the offset of the final part, which will likely be smaller than the file's chunk size. There are a few options here, but none of them particularly great:
Taking a step back, I think there are two users of multipart uploads:
Users in the second category are extremely unlikely to care about LocalFilesystem support, as they could just use the filesystem directly. As such I suspect they are adequately served by MultipartStore. I therefore think we can just focus our efforts on the first category of user, providing an efficient way to stream data, in-order to durable storage. I'm therefore leaning towards replacing
There are a few things worth highlighting here:
|
This design looks cool, but I have two concerns: Current design it requires users to call Also, I'm unclear about how |
In practice they have to do this anyway because of tokio-rs/tokio#4296 and #5366. In general the previous API was very difficult to actually use correctly, especially with the type of long-running synchronous operations that characterize arrow/parquet workloads.
The idea is if Upload has accumulated enough data to do so, it could upload multiple chunks in parallel, much like WriteMultipart does currently. Effectively aside from moving away from the problematic AsyncWrite abstraction this doesn't materially alter the way IO is performed, other than removing the async backpressure mechanism that makes AsyncWrite such a pain to integrate with predominantly sync workloads |
The design seems reasonable overall. In Lance, our write pattern at the moment looks like:
Because I am calling Also, is there a maximum buffer size enforced? Does reaching that make I'm thinking, if implemented, how I would use this API is put the writer on a background tokio task. Then I could run the IO calls in the background and implement backpressure over some channel. This brings up the question, it is safe to call |
Hmm... Good point. For that sort of workload something like put_stream would probably be the best option, it does have a certain elegant simplicity/symmetry to it 🤔 Something like
|
Had a good sync with @alamb and I think we've devised a way to support the original vision of exposing a MultiPart API for stores, including LocalFilesystem. Apologies for the noise |
To avoid leaving anyone in suspense, as I recall the basic idea is at first to require, for file backed object stores, that each multi-part upload is the same size (except for the last one). In this way, when writing multiple "parts" to a file, we can calculate apriori what offsets in the file each part will go. If a user tries to upload a part that is not the required size, the API will error with a clear message. We can eventually maybe extend the implementation to handle different sized chunks (with copying as part of the finalize) I may be misremembering this |
The design evolved a bit to accomodate reality, but it is largely inline with the spirit of the original proposal - #5500 PTAL and let me know what you think, I'm quite pleased with how it came out |
…tipartStore (#5458) (#5500) * Replace AsyncWrite with Upload trait (#5458) * Make BufWriter abortable * Flesh out cloud implementations * Review feedback * Misc tweaks and fixes * Format * Replace multi-part with multipart * More docs * Clippy * Rename to MultipartUpload * Rename ChunkedUpload to WriteMultipart * Doc tweaks * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * Docs * Format --------- Co-authored-by: Andrew Lamb <[email protected]>
|
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently streaming uploads are supported by
ObjectStore::put_multipart
. This returns aAsyncWrite
, which provides a push-based interface for writing data.However, this approach is not without issue:
Describe the solution you'd like
#4971 added a
MultipartStore
abstraction that more closely mirrors the APIs exposed by object stores, avoiding all of the above issues. If we could devise a way to implement this interface forLocalFileSystem
we could then "promote" it into theObjectStore
trait and deprecate put_multipart. This would provide the maximum flexibility to users, whilst being in keeping with the objectives of this crate to closely hew to the APIs of the stores themselves.The key observation that makes this possible, is that we already recommend
MultiPartStore
be used with fixed size chunks for compatibility with r2, we therefore could require this for LocalFilesystem, in turn allowing it to support out-of-order / parallel writes as the file offsets can be determined from the part index.#5431 and #4857 added
BufWriter
andBufReader
and these would be retained to preserve compatibility with the tokio ecosystem and provide a more idiomatic API on top of thisDescribe alternatives you've considered
I briefly considered a put_stream API, however, this doesn't resolve many of the above issues
We could also just implement MultipartStore for LocalFilesystem, whilst retaining the current
put_multipart
. This would allow downstreams to opt-in to the lower level API if they so wished.We could also modify put_multipart to return something other than AsyncWrite, possibly something closer to PutPart
Additional context
Many of the stores also support composing objects from others, this might be something to consider in this design - #4966
FYI @wjones127 @Xuanwo @alamb @roeap
The text was updated successfully, but these errors were encountered: