-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YCSB performance analysis #26137
Comments
I haven't looked into this in detail, but note that there's an open PR that is said to drastically improve YCSB performance: #25014 @nvanbenschoten, could you take a look at this? |
Hi @matjazmav, thanks for performing this benchmarking! We've done previous testing with YCSB (see #20448) and found similar results. As Tobi mentioned, we do have one change in the pipeline that has a lot of promise to dramatically improve some of the contention-heavy YCSB workloads. If you're interested, you could try doing a comparison before and after with that change.
That assumption is correct.
This would be a safe assumption for a well-distributed workload. Unfortunately YCSB is the opposite of that. It uses a zipf distribution to create large hotspots in activity, which works against the effects of horizontal scalability.
I believe that this is the issue that #25014 is trying to fix. Specifically, contended writes are not handled as well as they could be in Cockroach. |
@nvanbenschoten Thank you for explanation. Are results of YCSB testing that you have done available somewhere, I would like to compare it with mine? Maybe I'll repeat this benchmark on next version, I'm currently limited with time writing thesis :) I forgot to mention that current testing was done on top of Docker Swarm, each node running Docker version 18.03.0-ce and using official Docker image cockroachdb/cockroach:v2.0.1. |
@nvanbenschoten I believe you have nightly benchmarks to compare performance over time? Are this results publicly available? |
@matjazmav unfortunately we do not have published YCSB numbers at the moment, as past benchmarking has not been rigorous enough to permit publication. This is in contrast to TPC-C, where we have published results with detailed reproduction steps. The best results we have available in the linked issues above, but none of these are "official". In terms of nightly benchmarks, we don't currently have a good method of visualizing results over time. This is being tracked in #24366. |
@nvanbenschoten I'm looking at first two charts at workload C. How can I explain that both have almost same latency, but if I look at throughput Postgres perform more then 2x better? |
@matjazmav that indicates to me that Cockroach needs to perform more work and is, therefore, more resource hungry than Postgres even though it is able to achieve almost the same latency. In some sense, this is expected because Cockroach needs to perform more work to maintain a consistent replication across its three nodes. That said, we've made a number of performance improvements to CockroachDB for our upcoming 2.1 release. In particular, #25014 landed, which dramatically improves our performance on YCSB. I would be interested in how this affects our comparison to Postgres. We have a few other big changes on the horizon that should also have dramatic effects on YCSB. The most important of these is the ability to push partial-row update operations directly to the data so that we can avoid the read-then-write operation we currently need to perform for workloads like YCSB. I'm going to close this for now since there's not anything actionable to do, but please feel free to continue the discussion. |
Is it normal that CRDB perform worst on three node cluster then on one node. I don't have infrastructure to test performance on more nodes. But I'm assuming that this is because range replication, by default ranges are replicated 3 times. So I'm assuming that after adding more then 3 nodes (with default configuration) I should see performance increase almost linearly?
I also noticed that with increasing connections performance drop much faster then on regular Postgres database, what are your thoughts on that?
The measurements was made using brianfrankcooper/YCSB tool. To get more accurate results, each test was ran 3 times, both databases run on same infrastructure with default configuration. Infrastructure was build out of 4 old computers (i5, HDD, 4GB, 1Gb/s) each running Ubuntu server 16.04 LTE. They ware connected with Gigabit Ethernet switch.
All results are located here YCSB-Results - v2 - EN.xlsx.
NOTE: Number at base of series means number of connections where maximal throughput was reached.
The text was updated successfully, but these errors were encountered: