-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: profile IOPs usage #38398
Comments
cc @petermattis and @kannanlakshmi |
Per in-person discussion: RocksDB background operations are a significant source of IOPs. Improvements to compaction heuristics there, or in Pebble, have the potential to reduce IOPs, though such improvements are still on the drawing board. #38322 might significantly reduce IOPs. Cc @nvanbenschoten |
Posting breakdown of MSO costs (second tab) - as you can see storage and IOPS specifically are the highest driver of cost which will scale (I assume) as the cloud business grows. Any work in that direction will be directly beneficial in the long term profitability of the cloud business. https://docs.google.com/spreadsheets/d/1OxSNnQQMvy3j7V8IbsHB-Et7NHwM5yLpo8tpXag7vyE/edit#gid=0 |
I think issues like this (let's see if we're as good as we can be and if we're not let's become better) don't serve much purpose. At the very least it should start with someone's surprise at some data. |
I disagree--this issue is to make sure we do the investigation. Without an issue we don't have a central place to record that and put the motivation for it. |
After talking with @bdarnell I did some investigation of the value of IOPs using io1 in AWS a couple of weeks ago with a specific focus on YCSB/A with a uniform distribution (roughly KV50 with larger values). The results indicate that we may be over-provisioning IOPs and likely warrant further investigation. See the data here: |
@ajwerner Interesting data. How long did you run the tests for? For short runs, background compactions will be insignificant which might allow lower IOPs to perform well. |
The test was run for 10 minutes each. Also worth noting that the data in the sheet is the average from 5 runs each. #38161 has the logic for the roachtest. |
@ajwerner I'd guess that isn't long enough to hit significant RocksDB background operations. Data is still interesting, but let's be careful about reading too much into it. |
I think we should consider adding information like this to TPC-C testing as a fail condition (once we establish a baseline). To reiterate, I don't think this issue is about identifying the "correct" amount of IOPs usage for all workloads--it is about testing some common workloads and understanding where IOPs is consumed as well as determining if the consumption matches our expectation as a measurement of the DB. |
We have marked this issue as stale because it has been inactive for |
Currently, the managed service on AWS uses EBS volumes. These volumes are priced based on IOPs usage and as such, IOPs are one of the biggest costs to providing this service.
We should profile our IOPs usage to make sure that it meets our expectations and look for opportunities to reduce its usage.
The text was updated successfully, but these errors were encountered: