-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow snapshots to S3 #24123
Comments
@tlrx Can you take a look? |
I believe this will be fixed by #24403 |
@jjfalling Do you think you could run your tests again but on a version that includes @abeyad 's patch? |
Sure, I'll try to test it this week. |
@tlrx I've done some light testing and the snapshots are faster, but still much slower then the AWS cli. I need to do more testing but I'm not sure when I will have time. |
@jjfalling , any progress with this? s3 snapshot speed is still slower in comparison for both backup and restore on 5.6 |
@vrizopoulos Yes, we identified a slow down because of a inefficient buffer initialization in the S3 repository and I'll create a fix soon. Also, I'm investigating using a more recent version of the AWS SDK that should speed up uploads (#26993). I'll update this issue once I have more. |
I created #27280 to reduce memory allocation when snapshoting to S3 using Elasticsearch 6.x and I saw a performance improvement. I plan to backport the change down to 6.0 and 5.6.5. Note that AWS enabled throttling by default in March 2016 so one has to be careful when comparing Also please note that AWS cli and the S3 repository plugin are two different beasts, the former can push files using zero copy send files system calls while the latter has a rate limit by default and does much more than just uploading files. |
@tlrx, Thanks for the updates and your work on this. Based on the AWS article you've linked, it's the number of retries that gets throttled, not necessarily the transfer rate itself? Regarding the 40Mb/s default limit, is that something that be overwritten when creating the snapshot repo using the "max_snapshot_bytes_per_sec" "max_restore_bytes_per_sec" configuration keys? Finally is there perhaps a way to allow for multiple restores to execute in parallel that would provide us with a better aggregate performance? Thanks |
Yes, I linked to this article because @abeyad mentioned the issue #24403 which involves many retries when the incompatible snapshot blob is missing.
Yes, you can set both of them to 0 or -1 to disable the rate limiter. But test this before changing your production.
The current snapshot/restore feature does not allow multiple snapshots or restores to be executed at the same time. But for a given snapshot or restore, multiple shards are snapshotted/restored in parallel. If both index-a and index-b were snapshotted at the same time, when restoring the snapshot both indices will be restored in parallel. Does that help? Finally, I'd love to have more feedback from users about repository-s3 performance on versions 5.6.5/6.0.1/6.1.0 to see the impact of #27280. |
This was an issue that was affecting us on 5.6.4 and the upgrade to 6.1.0 seems to have helped a ton (snapshots are about 30% faster). Here's a redacted output from
|
@JD557 Thanks for the feedback! |
Is it possible that this might affect deletes as well? I'm running 5.6.3 and snapshot deletes are quite slow:
I don't know exactly what version of ES was used to make these snapshots but I don't think it was < 5.x. |
@eherot Some factors can affect the performance of a deletion, but I can't tell from your previous comment if the deletion took more time than usual on your platform or did you simply notice that the deletions were slow?
Many factors can slow down a snapshot deletion. The issue fixed by #27280 mostly impacts snapshot creation but it can affect deletion too as many files are updated during the deletion process. Most of the time the deletion is slow because the repository contains a lot of snapshots (the file |
Recently, I've noticed a huge slowdown in snapshots. After looking at the elasticsearch logs, I've found a lot of errors like this:
Is it possible that the latest changes can output an incorrect JSON somewhere? |
I'd say no, the latest changes didn't impact the output of this file. The exception is triggered by #22073 which adds detection for duplicated fields in XContent (in your case, a SMILE file). Can you please send me the |
We talked about this today with @ywelsch and we're going to close this issue. Some improvement has been made (#27280) and we can not expect the same performance as the aws cli because the snapshot/restore process is doing more things than uploading file. Anyway, feel free to add any comment on this issue as we're always happy to have more feedback on performance changes between versions. |
Elasticsearch version: 2.4.2
Plugins installed: [] Cloud AWS / Repository S3
JVM version: OpenJDK 1.8
OS version: Debian 8
Description of the problem including expected versus actual behavior:
Snapshots to S3 are significantly slower then transferring files to S3 using the AWS cli. Uploading a test file using the AWS cli gets ~230mb/s but the snapshots get only around an average of 25mb/s. Both of these tests were preformed on the same EC2 instance. The instance was also in the same region as the S3 bucket. The test index was 10gb with 5 shards and force merged to one segment. The snapshot throttle settings were also set to 10GB.
I also tested ES 5.2.2 and found the snapshot was significantly faster, however still much slower then using the AWS cli.
I've already talked to @tlrx about this issue and we agreed I should open an issue.
The text was updated successfully, but these errors were encountered: