Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: profile IOPs usage #38398

Closed
awoods187 opened this issue Jun 25, 2019 · 11 comments
Closed

core: profile IOPs usage #38398

awoods187 opened this issue Jun 25, 2019 · 11 comments
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-investigation Further steps needed to qualify. C-label will change. no-issue-activity X-stale

Comments

@awoods187
Copy link
Contributor

Currently, the managed service on AWS uses EBS volumes. These volumes are priced based on IOPs usage and as such, IOPs are one of the biggest costs to providing this service.

We should profile our IOPs usage to make sure that it meets our expectations and look for opportunities to reduce its usage.

@awoods187
Copy link
Contributor Author

cc @petermattis and @kannanlakshmi

@petermattis
Copy link
Collaborator

Per in-person discussion: RocksDB background operations are a significant source of IOPs. Improvements to compaction heuristics there, or in Pebble, have the potential to reduce IOPs, though such improvements are still on the drawing board.

#38322 might significantly reduce IOPs. Cc @nvanbenschoten

@kannanlakshmi
Copy link
Contributor

Posting breakdown of MSO costs (second tab) - as you can see storage and IOPS specifically are the highest driver of cost which will scale (I assume) as the cloud business grows. Any work in that direction will be directly beneficial in the long term profitability of the cloud business.

https://docs.google.com/spreadsheets/d/1OxSNnQQMvy3j7V8IbsHB-Et7NHwM5yLpo8tpXag7vyE/edit#gid=0

@awoods187 awoods187 added the A-kv Anything in KV that doesn't belong in a more specific category. label Jun 25, 2019
@awoods187 awoods187 added the C-investigation Further steps needed to qualify. C-label will change. label Jun 25, 2019
@andreimatei
Copy link
Contributor

I think issues like this (let's see if we're as good as we can be and if we're not let's become better) don't serve much purpose. At the very least it should start with someone's surprise at some data.
We have too many issues already... Mind if I close it? :)

@awoods187
Copy link
Contributor Author

awoods187 commented Jun 25, 2019

I disagree--this issue is to make sure we do the investigation. Without an issue we don't have a central place to record that and put the motivation for it.

@ajwerner
Copy link
Contributor

After talking with @bdarnell I did some investigation of the value of IOPs using io1 in AWS a couple of weeks ago with a specific focus on YCSB/A with a uniform distribution (roughly KV50 with larger values). The results indicate that we may be over-provisioning IOPs and likely warrant further investigation.

See the data here:
https://docs.google.com/spreadsheets/d/1VjVCZmnHGGpxtjbp3lIllZOpsCc7S29cSj_gZacWFF0/edit?usp=sharing (internal only)

@petermattis
Copy link
Collaborator

@ajwerner Interesting data. How long did you run the tests for? For short runs, background compactions will be insignificant which might allow lower IOPs to perform well.

@ajwerner
Copy link
Contributor

The test was run for 10 minutes each. Also worth noting that the data in the sheet is the average from 5 runs each. #38161 has the logic for the roachtest.

@petermattis
Copy link
Collaborator

@ajwerner I'd guess that isn't long enough to hit significant RocksDB background operations. Data is still interesting, but let's be careful about reading too much into it.

@awoods187
Copy link
Contributor Author

I think we should consider adding information like this to TPC-C testing as a fail condition (once we establish a baseline).

To reiterate, I don't think this issue is about identifying the "correct" amount of IOPs usage for all workloads--it is about testing some common workloads and understanding where IOPs is consumed as well as determining if the consumption matches our expectation as a measurement of the DB.

cc @andy-kimball

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-investigation Further steps needed to qualify. C-label will change. no-issue-activity X-stale
Projects
None yet
Development

No branches or pull requests

5 participants