Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor writeObject to only use MultipartUpload when required #27877

Merged
merged 1 commit into from
Aug 20, 2021

Conversation

juliusknorr
Copy link
Member

@juliusknorr juliusknorr commented Jul 8, 2021

Ensure that multipart uploads are only used when needed. For cases where the file is smaller than 5242880. Since streams may not have a fixed size before reading them some contortions performed by @tsdicloud makes sure that we read that amount of bytes into a buffer to determine if a simple putObject can be used or if the multipart upload is triggered with an AppendStream of the buffer and the rest of the original one.

Furthermore there is additional cleanup added to properly remove leftovers from failed multipart upload attempts.

I pulled out the changes of the branch with the s3 encryption and added some further test cases.

throw $e;
// if anything goes wrong with multipart, make sure that you don´t poison and
// slow down s3 bucket with orphaned fragments
$uploadInfo = $uploader->getState()->getId();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for adding this to the baseline, since while the abstract declaration is considered protected the implementation is a public method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it weird that they did not think of that, so I dug a little and:

can't we use that $e->getState()->getId(); ?

This is from https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-multipart-upload.html which actually recommend using ObjectUploader to get this kind of optimization. But maybe I am not looking at the right thing ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll check if that works as expected 👍

@juliusknorr juliusknorr marked this pull request as ready for review July 13, 2021 14:59
@juliusknorr juliusknorr added this to the Nextcloud 23 milestone Jul 13, 2021
Copy link
Member

@blizzz blizzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐘 tests 💪

@tsdicloud
Copy link
Contributor

Thanks Julius, to take the merge work over for me. Proud that my code will make it into product.

@icewind1991
Copy link
Member

What is the core problem being solved by not always using the multipart uploader?

I'm not a fan of having to buffer the first chunk for every upload

@tsdicloud
Copy link
Contributor

tsdicloud commented Aug 11, 2021

Besides the fact that the AWS classes themselves always use ObjectUploader for Objects smaller 5 MB,
you have ti understand that the Multipart uploader mechanism is more complex behind the scenes than Objectuploader.
Because of the Multipart nature, it creates so called fragmments in the background. And having too many open fragments in parallel heavily degrades S3 upload performance. Knowing this, you should avoid Multipart uploads for as many objects as possible.
And, additionally, if a Multipart upload is not properly cleaned up on a fail, a funnel of stalled fragments does also slowly degrade the performance of your installation.

BTW: the buffer is only 5 MB and the code is an improved version of the original AWS code in the PHP library,

@icewind1991
Copy link
Member

Fair enough, thanks for the explanation

@skjnldsv skjnldsv added 4. to release Ready to be released and/or waiting for tests to finish and removed 3. to review Waiting for reviews labels Aug 17, 2021
Copy link
Contributor

@kesselb kesselb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍

Super nice to use multipart less often. I've added some nitpicks mostly about phpdoc and possible type hints. But not a blocker for merge.

Could we use a higher value for single put operations? Like 25 or 50 MB. I understand the buffer uses the memory hence we should not pick a value to high. However most pictures have more than 5 MB nowadays.

lib/private/Files/ObjectStore/S3ObjectTrait.php Outdated Show resolved Hide resolved
lib/private/Files/ObjectStore/S3ObjectTrait.php Outdated Show resolved Hide resolved
// buffer is fully seekable, so use it directly for the small upload
$this->writeSingle($urn, $buffer, $mimetype);
} else {
$loadStream = new Psr7\AppendStream([$buffer, $psrStream]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We append buffer and psrStream because the psrStream might be non-seekable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

lib/private/Files/ObjectStore/S3ObjectTrait.php Outdated Show resolved Hide resolved
lib/private/Files/ObjectStore/S3ObjectTrait.php Outdated Show resolved Hide resolved
Signed-off-by: Bernd Rederlechner <[email protected]>

Co-authored-by: Julius Härtl <[email protected]>
@juliusknorr
Copy link
Member Author

juliusknorr commented Aug 20, 2021

Could we use a higher value for single put operations? Like 25 or 50 MB. I understand the buffer uses the memory hence we should not pick a value to high. However most pictures have more than 5 MB nowadays.

Will do a follow up PR for that to check about the memory usage in detail with different values.

Edit: draft in #28542

@juliusknorr
Copy link
Member Author

All other addressed.

@kesselb kesselb merged commit e2116e2 into master Aug 20, 2021
@kesselb kesselb deleted the enh/s3-putObject branch August 20, 2021 16:54
tsdicloud added a commit to nextmcloud/server that referenced this pull request Sep 13, 2021
We need the fragment optimisation to avoid S3 Slowdown on OTC, as described in
nextcloud#27877

Signed-off-by: [email protected] <[email protected]>
sgyuris pushed a commit to nextmcloud/server that referenced this pull request Oct 20, 2021
We need the fragment optimisation to avoid S3 Slowdown on OTC, as described in
nextcloud#27877

Signed-off-by: [email protected] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4. to release Ready to be released and/or waiting for tests to finish bug feature: object storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants