raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg #10258

ajwerner · 2018-11-14T15:05:01Z

Prior to this change, MaxSizePerMsg was used both to cap the total byte size of
entries in messages as well as the total byte size of entries passed through
CommittedEntries in the Ready struct. This change adds a new Config parameter
MaxCommittedSizePerReady which defaults to MaxSizePerMsg and contols the second
of above descibed settings.

This enables resolution of cockroachdb/cockroach#31511

Please read https://github.com/etcd-io/etcd/blob/master/CONTRIBUTING.md#contribution-flow.

Prior to this change, MaxSizePerMsg was used both to cap the total byte size of entries in messages as well as the total byte size of entries passed through CommittedEntries in the Ready struct. This change adds a new Config parameter MaxCommittedSizePerReady which defaults to MaxSizePerMsg and contols the second of above descibed settings.

codecov-io · 2018-11-14T16:15:52Z

Codecov Report

Merging #10258 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #10258      +/-   ##
==========================================
+ Coverage   71.67%   71.69%   +0.02%     
==========================================
  Files         390      390              
  Lines       36369    36371       +2     
==========================================
+ Hits        26067    26078      +11     
+ Misses       8488     8483       -5     
+ Partials     1814     1810       -4

Impacted Files	Coverage Δ
raft/log.go	`79.34% <100%> (ø)`	⬆️
raft/raft.go	`92.13% <100%> (+0.67%)`	⬆️
etcdctl/ctlv3/command/lease_command.go	`65.34% <0%> (-5.95%)`	⬇️
pkg/testutil/recorder.go	`77.77% <0%> (-3.71%)`	⬇️
proxy/grpcproxy/watch.go	`89.75% <0%> (-3.02%)`	⬇️
etcdserver/api/v3rpc/lease.go	`69.31% <0%> (-2.28%)`	⬇️
raft/storage.go	`87.96% <0%> (-1.86%)`	⬇️
lease/leasehttp/http.go	`63.97% <0%> (-1.48%)`	⬇️
etcdctl/ctlv3/command/printer_simple.go	`72.48% <0%> (-1.35%)`	⬇️
etcdserver/api/rafthttp/peer.go	`78.77% <0%> (-1.12%)`	⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee9dcbc...e4af2be. Read the comment docs.

xiang90 · 2018-11-15T13:59:56Z

lgtm

Pick up etcd-io/etcd#10258

Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready wiht the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes cockroachdb#31511 Release note: None

Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready with the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes cockroachdb#31511 Release note: None

32387: storage: adopt new raft MaxCommittedSizePerReady config parameter r=ajwerner a=ajwerner Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready wiht the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes #31511 Release note: None Co-authored-by: Andrew Werner <[email protected]>

Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready with the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes cockroachdb#31511 Release note: None

bdarnell requested review from bdarnell and xiang90 November 14, 2018 16:17

bdarnell approved these changes Nov 14, 2018

View reviewed changes

nvanbenschoten approved these changes Nov 14, 2018

View reviewed changes

xiang90 merged commit fa92397 into etcd-io:master Nov 15, 2018

ajwerner added a commit to cockroachdb/vendored that referenced this pull request Nov 15, 2018

Bump etcd

e52f25d

Pick up etcd-io/etcd#10258

ajwerner mentioned this pull request Nov 15, 2018

storage: adopt new raft MaxCommittedSizePerReady config parameter cockroachdb/cockroach#32387

Merged

absolute8511 mentioned this pull request Mar 13, 2019

Consider merge some optimize from etcd-raft youzan/ZanRedisDB#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg #10258

raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg #10258

ajwerner commented Nov 14, 2018

codecov-io commented Nov 14, 2018

xiang90 commented Nov 15, 2018

raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg #10258

raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg #10258

Conversation

ajwerner commented Nov 14, 2018

codecov-io commented Nov 14, 2018

Codecov Report

xiang90 commented Nov 15, 2018