Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force read repair via aae_fold #1793

Closed
martinsumner opened this issue Sep 2, 2021 · 1 comment
Closed

Force read repair via aae_fold #1793

martinsumner opened this issue Sep 2, 2021 · 1 comment

Comments

@martinsumner
Copy link
Contributor

martinsumner commented Sep 2, 2021

The TictacAAE solution adds the aae_fold capability. however, when compared to legacy AAE it is slower to resolve deltas between vnodes - by default being limited to fixing 256 deltas per exchange.

Generally if a node goes down, most repair happens via hinted handoff, and most immediate issues are resolved through natural read repair. The max results can also be uplifted at run-time (but uplifting beyond 2048 tends to lead to skipped exchanges due to the time taken).

A bigger problem is when a node is recovered from backup, and then returned to the cluster - i.e. where there may be a big gap between the time of the backup, and the time the node went down (e.g. when fallbacks gathered PUTs to return via hinted handoff). This may take a long time to resolve via AAE, and there may be no natural application action to resolve through read repair.

Proposal is for a new aae_fold - repair_keys_range (similar in form to repl_keys_range), this can then be used by an administrator if there is a known gap to accelerate recovery. e.g. If we know the time of the backup, and the time of the node failure - a LMD range can be used between those times in the aae_fold to prompt immediate read repair for everything within the range. As well as the usual filter, it would be beneficial to be able to specify a node (the recovering node), so that only object where the node is in the preflist have read repair prompted.

Currently there is a test where it takes 24 hours for AAE to recover a gap between a backup timestamp and a node failure time (with > 10M PUTs between these times in an 8-node cluster). The aim here is to allow for this recovery to be accelerated by an administrator to < 1 hour.

@martinsumner
Copy link
Contributor Author

#1795

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant