perf: Update/upsert performanace regression 1.0.4 -> master #17557

bdarnell · 2017-08-09T18:05:33Z

The test in #17507 showed that update and upsert performance got significantly slower (from ~30ms to ~60ms) since 1.0.4. Full details are in that issue, but I'm filing this issue to separately track investigation of the performance regression (as opposed to the difference between schemas with and without column families which is the main focus of #17507).

jordanlewis · 2017-08-11T17:36:16Z

I've reproduced the regression using kv:

1.0.4:

[13:33:2]% ./kv --read-percent=0 --cycle-length=1                                                 
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
      1s        0          285.6          285.6      3.7    268.4    385.9    453.0
      2s        0          262.2          273.9      3.8    453.0    704.6   1275.1
      3s        0          270.9          272.9      3.7    402.7    604.0    906.0
      4s        0          261.0          269.9      3.8    369.1    738.2    973.1
      5s        0          259.0          267.8      3.9    486.5   1073.7   1879.0
      6s        0          261.0          266.6      3.8    352.3    704.6   1040.2
      7s        0          245.0          263.5      4.2    335.5    738.2   1275.1
      8s        0          256.0          262.6      3.7    318.8    771.8   1476.4
      9s        0          260.0          262.3      3.7    335.5    671.1   1208.0
     10s        0          253.0          261.4      3.8    335.5   1040.2   1342.2
     11s        0          252.0          260.5      3.8    402.7    805.3   2013.3
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
   11.1s        0           2893          260.4     59.4      3.8    369.1    838.9   2013.3

BenchmarkBlocks     2893           3840472.8 ns/op

master:

[13:33]% ./kv --read-percent=0 --cycle-length=1                      
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
      1s        0           73.9           73.9     22.0    604.0    838.9   1006.6
      2s        0           65.9           69.9     22.0    939.5   1879.0   1946.2
      3s        0           65.2           68.3     22.0   1140.9   1610.6   2281.7
      4s        0           65.0           67.5     23.1   1811.9   2415.9   3758.1
      5s        0           68.0           67.6     22.0    838.9   1610.6   3087.0
      6s        0           66.0           67.3     23.1    805.3   1543.5   2550.1
      7s        0           64.0           66.8     23.1    671.1   1476.4   2147.5
      8s        0           65.9           66.7     22.0   1476.4   2080.4   2147.5
      9s        0           69.1           67.0     23.1    939.5   1744.8   1879.0
     10s        0           65.0           66.8     22.0   1140.9   2147.5   2147.5
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
   10.2s        0            683           66.7    226.0     22.0   1140.9   2147.5   3758.1

BenchmarkBlocks      683          14991670.2 ns/op

Now to figure out what's going on.

petermattis · 2017-08-11T17:39:13Z

This is a highly concurrent workload. The problem is likely in the guts of storage and is reminiscent of (or the same as) #15797.

Curious that master is so much slower than 1.0.4, though.

petermattis · 2017-08-11T17:44:18Z

And by high concurrency, I mean concurrency on a single row.

jordanlewis · 2017-08-11T17:46:38Z

I thought that workload that @robert-s-lee produced to demonstrate the upsert/insert slowdown was pretty much the same as that the above kv command produces, but I was mistaken. Actually, each thread in his workload is repeatedly updating the same row, local to that thread, so there should be no contention.

I'll make a more accurate reproduction now.

jordanlewis · 2017-08-11T19:04:21Z

Bisecting the high-concurrency case.

jordanlewis · 2017-08-11T19:17:29Z

The regression seems to have been introduced in 82cbb49.

petermattis · 2017-08-11T19:26:51Z

Can you try setting rocksdb.min_wal_sync_interval = 0ms?

jordanlewis · 2017-08-11T19:34:03Z

That cluster setting removes the performance issue. Is that a setting that we can recommend? Is there a more balanced default that we can pick?

petermattis · 2017-08-11T19:42:53Z

This is the second time this setting is come up. The effect on heavily loaded systems was present, but small. Can you send a PR to change the default to 0ms?

Thanks for tracking this down.

irfansharif · 2017-08-12T02:44:40Z

why does/did rocksdb.min_wal_sync_interval = 0ms work?

petermattis · 2017-08-12T13:47:07Z

why does/did rocksdb.min_wal_sync_interval = 0ms work?

The setting specifies a minimum interval between WAL syncs and thus adds a minimum latency to write operations when there is no concurrency. In the test @jordanlewis was performing, there were concurrent workers, but they were all hitting the same key so there was no concurrency during syncing.

petermattis assigned jordanlewis Aug 9, 2017

petermattis added this to the 1.1 milestone Aug 9, 2017

jordanlewis mentioned this issue Aug 11, 2017

settings: min_wal_sync_interval defaults to 0ms #17601

Merged

jordanlewis closed this as completed in #17601 Aug 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Update/upsert performanace regression 1.0.4 -> master #17557

perf: Update/upsert performanace regression 1.0.4 -> master #17557

bdarnell commented Aug 9, 2017

jordanlewis commented Aug 11, 2017

petermattis commented Aug 11, 2017

petermattis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

petermattis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

petermattis commented Aug 11, 2017

irfansharif commented Aug 12, 2017

petermattis commented Aug 12, 2017

perf: Update/upsert performanace regression 1.0.4 -> master #17557

perf: Update/upsert performanace regression 1.0.4 -> master #17557

Comments

bdarnell commented Aug 9, 2017

jordanlewis commented Aug 11, 2017

petermattis commented Aug 11, 2017

petermattis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

petermattis commented Aug 11, 2017

jordanlewis commented Aug 11, 2017

petermattis commented Aug 11, 2017

irfansharif commented Aug 12, 2017

petermattis commented Aug 12, 2017