-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MultiPartStore (#4961) (#4608) #4971
Conversation
path: &Path, | ||
id: &MultipartId, | ||
part_idx: usize, | ||
data: Bytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having Bytes here instead of Vec<u8>
is great, as it means we don't need to own the data. Can we also make that change to PutPart's put_part()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do, but I would rather do that as a separate PR as it would be a breaking change, and given PutPart is largely intended to be used with WriteMultipart I'm less sure of the utility
&self, | ||
path: &Path, | ||
id: &MultipartId, | ||
part_idx: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do I need a part IDX here if I need to provide a sorted list to complete_multipart
anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that you can upload chunks in parallel and out of order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I have to order them at the upload time or completion time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At completion time you must provide them in the order you want them to be (although AWS doesn't actually care)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So part_idx
goes from 0..N
(N being the number of parts, are holes allowed?) and the list passed to the complete call must have the same order as if I would order the results of PUT calls by the part_id
arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to document this better
object_store/src/multipart.rs
Outdated
/// Most stores require that all parts apart from the final are at least 5 MiB. Additionally | ||
/// some stores require all parts excluding the last to be the same size, e.g. [R2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the store at least tell the user which size it expects? Currently this relies on expert knowledge of the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WriteMultipart currently always uses a fixed size of 10MiB, I will update the docs to reflect this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any tests for this new API -- can we please add coverage somehow?
data: Bytes, | ||
) -> Result<PartId>; | ||
|
||
/// Completes a multipart upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also document here what the expectations are for a successful completion? For example, add the note about "parts is the order you want the parts to appear in the final output"
Also can we say anything about what happens to PartId
that are not used (as in put_part is called and returns a PartId
but then that PartId is not supplied here)?
(BTW thank you very much for doing this @tustvold -- it is great to see the object_store crate's API evolve and that the underlying design accommodates these new features without major issue) |
I've tried to add some docs clarifying the API contract |
/// The `i`'th value of `parts` must be a [`PartId`] returned by a call to [`Self::put_part`] | ||
/// with a `part_idx` of `i`, and the same `path` and `id` as provided to this method. Calling | ||
/// this method with out of sequence or repeated [`PartId`], or [`PartId`] returned for other | ||
/// values of `path` or `id`, will result in implementation-defined behaviour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an example of implementation defined behaviour, Azure will produce a file with parts in the order passed to this function, regardless of if they are out of sequence, and if they were uploaded with different values of MultipartId
.
Other stores do various combinations of erroring, or producing files with parts in different orders. Provided people stick to the API contract though, the behaviour is consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the documentation. My only remaining question is about testing this API -- as it stands now I think it could be removed without anything breaking -- perhaps we could add a doc test somewhere that shows how it is used that would also serve to make it more discoverable?
Finally, if possible, it would be awesome to add a doc link from put_multipart to MultiPartStore
to help people who need more power to find it more easily
I've added an integration test, a doc test would be tricky as it requires a running service |
@@ -263,7 +263,6 @@ pub use client::{ | |||
#[cfg(feature = "cloud")] | |||
mod config; | |||
|
|||
#[cfg(feature = "cloud")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed in part because it makes the docs sad otherwise, but also because the contained interfaces don't require any additional dependencies and are useful even when not using any of the first-party backends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold -- this looks great.
Which issue does this PR close?
Closes #4961
Closes #4608
Closes #4965
Rationale for this change
Some use-cases wish to interact with the multipart upload machinery directly, whilst we can't support this for all stores, we can provide it for many. This therefore follows the example of #4876 and extracts a trait that can then be implemented by the various stores that have such a notion
What changes are included in this PR?
Are there any user-facing changes?