Batch Snapshot Finalizations #82824
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
>enhancement
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Snapshot finalization happens snapshot-by-snapshot at the moment and involves a sequence of:
This means that finalizing a snapshot (even a single index one) in the real world probably takes more than a second to finalize.
So far this was a non-issue but in the context of #77466 it's becoming one.
For one, setting up a benchmark cluster containing a large number of single index snapshots take significant amounts of time.
More importantly though, it means that ILM policies that move an index to frozen tier cannot efficiently execute moving multiple indices simultaneously and could queue up many minutes of work from finalising single index snapshots which means that SLM backups as well as snapshot delete jobs will be delayed for a non-trivial period of time as well.
In extreme but conceivable cases like moving 1k snapshots to the frozen tier this could mean running finalisations for an hour or more.
To fix this we should batch multiple waiting finalisations into one in
SnapshotsService
and the repository. This will allow finalising multiple snapshots within the sameRepositoryData
write as well as the same two cluster state updates for the repo generation tracking. All the global metadata writes and index metadata writes as well as thesnap-$uuid
blob writes can still happen exactly as they do today.The text was updated successfully, but these errors were encountered: