Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Snapshot Finalizations #82824

Open
Tracked by #77466
original-brownbear opened this issue Jan 19, 2022 · 1 comment
Open
Tracked by #77466

Batch Snapshot Finalizations #82824

original-brownbear opened this issue Jan 19, 2022 · 1 comment
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@original-brownbear
Copy link
Member

Snapshot finalization happens snapshot-by-snapshot at the moment and involves a sequence of:

  1. Cluster state update
  2. Write to the repository
  3. Cluster state update

This means that finalizing a snapshot (even a single index one) in the real world probably takes more than a second to finalize.
So far this was a non-issue but in the context of #77466 it's becoming one.
For one, setting up a benchmark cluster containing a large number of single index snapshots take significant amounts of time.
More importantly though, it means that ILM policies that move an index to frozen tier cannot efficiently execute moving multiple indices simultaneously and could queue up many minutes of work from finalising single index snapshots which means that SLM backups as well as snapshot delete jobs will be delayed for a non-trivial period of time as well.
In extreme but conceivable cases like moving 1k snapshots to the frozen tier this could mean running finalisations for an hour or more.

To fix this we should batch multiple waiting finalisations into one in SnapshotsService and the repository. This will allow finalising multiple snapshots within the same RepositoryData write as well as the same two cluster state updates for the repo generation tracking. All the global metadata writes and index metadata writes as well as the snap-$uuid blob writes can still happen exactly as they do today.

@original-brownbear original-brownbear added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 19, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 19, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

2 participants