Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: roachperf shows tpc-c regression in march #36097

Closed
awoods187 opened this issue Mar 25, 2019 · 8 comments
Closed

performance: roachperf shows tpc-c regression in march #36097

awoods187 opened this issue Mar 25, 2019 · 8 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors

Comments

@awoods187
Copy link
Contributor

image
https://crdb.io/perf/?filter=&view=tpccbench%2Fnodes%3D3%2Fcpu%3D4&cloud=gce

We hit ~470 from mid-November through the end of 2019. January and February dropped us to 410-440, and March dropped us below 400.

@awoods187 awoods187 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors labels Mar 25, 2019
@awoods187
Copy link
Contributor Author

Seems to be true for YCSB too
image

@bdarnell
Copy link
Contributor

For reference, automatic stats collection was enabled on Feb 4 (#34529). Then it was disabled on Feb 19 (#35019), throttling was added (#34928), and it was reenabled on Feb 28 (#35291). This was expected to cause some performance decrease (and for the w=max tpcc tests, it reduced the target number of warehouses for 3 GCE nodes from 1400 to 1350).

There's a lot of noise here and it's difficult to assign responsibility to specific changes. Feb 6 scored higher than Feb 3, even though it had the unthrottled stats enabled. Feb 22 wasn't much better than Feb 19. It looks like automatic stats are probably contributing here, but they're not the whole story.

@awoods187
Copy link
Contributor Author

With auto stats is first, turned off is second:
image

Auto stats seems to have an impact but it's still clearly bad regardless

@awoods187
Copy link
Contributor Author

Run with auto stats turned off pre import:
image

@awoods187
Copy link
Contributor Author

And similarly, I ran 10k on 15 nodes:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  900.0s    94682.0  73.6%   6698.6   4160.7  16643.0  21474.8  33286.0 103079.2

image

@awoods187
Copy link
Contributor Author

Okay perhaps there was some sort of user error yesterday on 10k as today I had no problems:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  900.0s   125143.6  97.3%    263.7    243.3    503.3    604.0    805.3   6979.3

@bdarnell
Copy link
Contributor

bdarnell commented Apr 1, 2019

A couple of data points from tpccbench (which is our script to discover the max supported warehouses): I've run it twice on the same build (a master build from a couple of days ago) and I got 1395 one time and 1315 the second.

22:26:58 tpcc.go:717: --- FAIL: tpcc 1320 resulted in 12802.9 tpmC and failed due to efficiency value of 76.94691781122275 is below passing threshold of 85
22:26:58 tpcc.go:728: ------
MAX WAREHOUSES = 1315
------

bdarnell added a commit to bdarnell/cockroach that referenced this issue Apr 1, 2019
Make large cuts to deflake the test until we can get enough data to
put a tighter bound on it. The AWS case does not appear to have passed
since its introduction.

Closes cockroachdb#35337
Updates cockroachdb#36097

Release note: None
craig bot pushed a commit that referenced this issue Apr 2, 2019
36401: roachtest: Reduce tpcc/w=max targets r=tbg a=bdarnell

Make large cuts to deflake the test until we can get enough data to
put a tighter bound on it. The AWS case does not appear to have passed
since its introduction.

Closes #35337
Updates #36097

Release note: None

Co-authored-by: Ben Darnell <[email protected]>
bdarnell added a commit to bdarnell/cockroach that referenced this issue Apr 2, 2019
Make large cuts to deflake the test until we can get enough data to
put a tighter bound on it. The AWS case does not appear to have passed
since its introduction.

Closes cockroachdb#35337
Updates cockroachdb#36097

Release note: None
@awoods187
Copy link
Contributor Author

Closing this as it is no longer relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

No branches or pull requests

3 participants