-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mas i1805 hintedrepair #1806
Mas i1805 hintedrepair #1806
Conversation
Allow for range-based exchanges to be used when standard exchanges identify a demand for further AAE, and a localised AAE issue.
Should be able to track what is happening via both stats and logs.
If all nodes in the cluster are not yet at this version, then they may not support the necessary aae_folds required by the range-based repairs.
As the pause system exists to space out any repairs, the proposal in the issue to have a new read-repair-manager process has been dropped. |
Test evidence of any performance/efficiency improvements still required before PR is merged |
For initial performance testing, this riak_test can be used for comparisons. With these settings (switching the MAX_RESULTS back to 256 when testing previous), approximately 60-70% reduction in repair time with this branch |
In a full environment volume test, the new code was able to repair deltas discovered via AAE at 3.5 x the rate of the previous code. At this rate, it was using more CPU, but roughly only 30% - 50% more. so the improvement is CPU efficient. It could be de-tuned to gain the extra benefit at the equivalent speed (such as by reducing The volume test was done with a ring-size of 512, on an 8-node cluster with about 180M keys. On one node a backup is taken, then more load is added, the node is stopped, and then more load is added - then the node is recovered from backup. When the node is recovered, there is a hinted handoff of recent writes (since the stop), but then the previous delta (between the backup and the stop) needs to be recovered via AAE. In this case, this delta amounts to about 1.8M keys. In Riak 3.0.9, recovering the delta completely takes just over 24 hours, although the majority is recovered within 16 hours. The peak rate of recovery is just over 3K objects per minute. During the recovery, about 10% of available CPU is consumed. Chart shows decline in mismatched segments over 24 hours: With this branch, recovering the delta completely takes about 10 hours, and the majority is recovered within 4 hours. The peak rate of recovery is 11K objects per minute. During the recovery, about 15% of available CPU is consumed. If the With this setting, recovering the delta completely takes 17 hours, and the majority is recovered within 8 hours. The peak rate of recovery is 7K objects per minute. During the recovery less than 10% of available CPU is consumed. |
Note that there appear to be two "stripes" in the recovery. There are two forms of delta created by the test:
|
Following performance testing of new feature, AAE throughput can be improved at this level on current settings, without increasing CPU load. Configuration option no longer hidden, as it may require turning by users who wish for AAE deltas to be closed more rapidly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not done yet with review... but want to save the state
In Tictac AAE, repairs are slow relative to the previous (and still default) kv_index_hashtree solution. Comparisons to discover broken AAE segments are fast, but then a fetch_clocks query has to be run.
The fetch_clocks query is a full keystore scan - but each slot in the store has a segment index, so whole slots can be skipped when they contain no interesting segments. The skipping improves speed, but there are issues:
tictcaaae_maxresults
(the segment count) the less skipping can be done, and the slower the query.What is implemented here is prompted repairs. There is as before, scheduled exchanges, however these by default only repair 128 segments not 256 (to reduce overheads). These exchanges are otherwise unchanged.
If the exchanges highlight keys that need repairing, they are repaired as before. However, the
riak_kv_tictcaaae_repairs:analyse_repairs/2
function examines those repaired keys - looking at the modified dates, buckets and (optionally) the key ranges. Theanalyse_repairs/2
will result potentially then in a new exchange being prompted, but this time with a filter that will accelerate the time taken to run the resulting fetch_clocks query (with either a bucket and modified time range, or just a modified time range).These prompted exchanges should be able to find repairs (via fetch_clocks) an order of magnitude faster than standard exchanges, as:
There will be
tictcaaae_repairloops
(default 4) run following an exchange, until the loop count is exhausted or a loop finds insufficient keys to warrant continuing (less than 50% of requested max results). These loops will have thetictacaae_maxresults
boosted by atictacaae_rangeboost
integer (default 2 i.e. go to 256).