-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot deletion and creation slow down as number of snapshots in repository grows #8958
Comments
…th large number of snapshots Fixes elastic#8958
…th large number of snapshots Fixes elastic#8958
Just wanted to chime in, this issue has affected us a great deal as well. It made "sense" after I thought it through, how ES snapshotting works, but was an unpleasant surprise. |
…th large number of snapshots Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency. Fixes elastic#8958
…th large number of snapshots Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency. Fixes elastic#8958
…th large number of snapshots Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency. Fixes elastic#8958
I seem to be seeing this behavior with azure blob storage after upgrading to 1.7.5 |
@niemyjski It was fixed in #8969 in 2.0.0 and above. The fix wasn't backported to 1.7.5. |
And, if you've read this far and were wondering if the fix for this might ever get backported to 1.x... the answer is apparently not: @ #8969 (comment) : imotov says
|
In order to create a new snapshot or delete an existing snapshot, elasticsearch has to load all existing shard level snapshots to figure out which files need to be copied and which files can be cleaned. The number of files to be checked is equal to
number_of_shards * number_of_snapshots
, which on a large clusters and frequent snapshots can lead to very long operation times especially with non-filesystem repositories. See elastic/elasticsearch-cloud-aws#150 and this group post for examples of issues that this behavior is causing.The text was updated successfully, but these errors were encountered: