-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file content upload performance needs improvement -- currently about 5x slower than rsync #2007
Comments
From: daviddavis (daviddavis) There's something like a 30-40% overhead that's incurred by using the chunk upload API because it has to handle the upload chunk file/object, validate the chunk(s) as they are uploaded, and create/run a task to combine the chunks into an artifact. And this overhead exists even if the chunk size is bigger than the file. We could add functionality to the CLI to let users bypass chunked uploading and upload to the artifact creation endpoint directly. We could save another few seconds (10% or so) by not calculating the checksum digests on the client side for the file. We send this digest to pulp to ensure that the upload isn't corrupted. I'm not sure the savings are worthwhile but maybe we could add an option. Lastly, we could try to see if we can optimize the CLI/server code as well. |
From: daviddavis (daviddavis) I confirmed with @vkv that this command took 19s (or about half the time):
|
From: @bmbouter (bmbouter) Am I reading correctly that not using the chunking API, but a straight file-upload is faster (like in Comment 2)? Also when using the chunking API do the chunks upload in parallel? I think it's important to have checksum calculation on the client be default, but if an option to disable it existed I'd also be ok with that. Also how can "the chunk size is bigger than the file" ? |
From: daviddavis (daviddavis) bmbouter wrote:
Correct.
No but even in the case where the chunk size exceeds or is equal to the file size (and there is only one chunk), it's still rather slow.
Agreed
Chunk size is an option when uploading. You can set it to something greater than the size of the file in which case it uploads the entire file in one chunk. So like if you have a 1 kb file and you have a 1 mb chunk size, it'll just upload the file in one chunk but it still uses the chunk upload api. |
From: @bmbouter (bmbouter) Thanks for the info. One of the main reason for the chunking API is to get around the file-size maximum of nginx and apache. So given that if the file is < $webserver_size_max then the CLI should use the non-chunking API by default. If >= $webserver_size_max it should use the chunking API. Having an option to force one or the other, and/or to adjust the limit the CLI should use to think of $webserver_size_max I think would be good. For example nginx I believe defaults to 1mb but the installer can configure this see here. Also the CLI when using the chunking API should upload in parallel. Even if the network isn't allowing speed gains due to the rate limit, having the chunk sha256 calculation happen in parallel on the server side (and locally) would be faster. I'm not sure if chunk-size is exposed to the CLI, but I think it should not be if possible. |
From: daviddavis (daviddavis) bmbouter wrote:
I'm not sure I follow about why chunk size should not be exposed as an option in the CLI? If a user has a Pulp server set up that accepts a certain size of request (500kb), how do they inform the pulp cli? |
From: @bmbouter (bmbouter) I think we're both wanting the user to specify the max size the server can accept. If that's called chunk-size that's great. Will it default to the default of nginx and apache? The idea I'm trying to bring forward is a smarter cli where the user doesn't have to make a choice about chunking or not. Have the user either declare a different chunk size, or use the default, but in both cases the CLI knows when to chunk and when to not. Does this help answer it some? |
From: daviddavis (daviddavis) I've opened the following against the CLI: I think that should be most of what we can do in the CLI to speed up uploads there. The rest should of the work will need to happen in the pulpcore upload code. |
This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution! |
Well, some of the low-hanging fruit here is that we're using 1mb chunks by default. I don't know if perhaps too-large chunks would be dangerous in terms of memory consumption, but 1mb is probably too small. We could probably squeeze 10mb - 20mb safely, perhaps as high as 50mb. Not sure we need to go higher given the diminishing returns, but that would likely yield a significant improvement. One thing to note, Pulp generates a bunch of checksums on upload, and if Rsync does the same (I don't know) it probably only generates one checksum. So there's always going to be some additional overhead, but we should reduce it as much as possible. If the checksums are a significant contributor, #4726 might help. Or maybe we could make the checksum types configurable per-plugin, so that not checksum needs to be calculated for artifacts it is not directly relevant to. |
Some testing to see the affect of chunk size would be good. We should also keep in mind that our test environment will be low-latency and in-practice the client/server latency will be somewhat higher (not really high propbably just different). Note, the 1MB client body is the nginx default https://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size I don't think there's any issue going above that, but it means that for deployments that configure their own webservers they need to accommodate. |
Is there even a place in pulpcore, where we prescribe the upload chunk size? |
I don't know that we should close it before we do a bit of profiling and attribute a cause. The issue itself is a valid one and all we have are guesses as to the cause. Even after adjusting the chunk size the OP reports a significant penalty. You might be right that the problem primarily lies in pulp-cli, hammer, etc. though. |
https://httpd.apache.org/docs/2.4/mod/core.html#limitrequestbody do I read it right that the default in Apache is 1 GB? |
The original report was about hammer, so the path is: client -> Hammer -> Apache -> Foreman (Katello) -> Apache -> Pulp Looking at Katello, it defaults to 1 MB chunk size: It looks like it's implemented via a Task: That implies Foreman also stores it somewhere in between. Looks like that's on disk. |
It shows me default=0 (unlimited) and 2GB being the largest possible value. |
It looked at:
But now I see
And EL8 includes 2.4.37 where EL9 includes 2.4.57. |
Author: vk (vk)
Redmine Issue: 8839, https://pulp.plan.io/issues/8839
Following up from IRC #pulp, I've just recently started using pulp 3.12.2, and tried uploading a 1Gbyte file.
I created it with:
That took about 5 seconds (suggesting my local spinning disk based filesystem could handle about 200MB/s, assuming no bottlenecks from /dev/urandom).
Then I did a:
It took about 3 mins, which is quite a long time. This is on the same machine where I created the file, to pulp running locally.
I did an rsync over ssh of the same file to a second machine and it took a little over 9s.
On the second machine, I set up a replica repo with a remote set to the first pulp instance.
The sync took a little over 9s -- i.e. matched rsync.
On the first pulp instance, based on suggestions from IRC, I updated nginx.conf to set client_max_body_size to 1024m for /pulp/api/v3 (and /pulp/content -- though I don't know if the latter was needed).
I then used --chunk-size 1000000000 (1 billion bytes) with the pulp file content upload and got down to 43s. That's still 4.8x slower than rsync.
I realize there are a few seconds of overhead for database operations and checksums (I measured the latter at about 5-6 seconds by running sha256/384/512sum and totaling them). But still, it seems quite slow.
At the moment, this is just a prototype setup. My goal is to have pulp instances globally as replicas for our custom file artifacts. We need to keep the total time taken from uploading an artifact through availability to clients around the world to be as low as possible -- i..e. exploit the best that the underlying infrastructure is capable of. Most of the files I anticipate to be under 100Mbytes, in which case I expect other background operations to take up a greater percent of the total time. I haven't even counted time taken for updating the Publication and Distribution that clients will connect to.
What can be done to improve this?
The text was updated successfully, but these errors were encountered: