-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema: add a knob to control the speed of schema changes #36430
Comments
#36403 is the knob -- it limits, at the receiving node, the number of expensive SSTs we handle at once. We'll be merging and backporting ASAP, and then exploring tuning it in a later back ported patch if needed. While that the knob, there's one case it doesn't cover where a follower of many ranges might get swapped with files, even if each of those ranges' leaders have the knob turned down, since while they each obeyed the knob, as a follower you could get files from all of them. For that, just getting and applying SSTs as a follower isn't that expensive though, so if we can keep rocksdb from freaking out about the number of files, we should be good. That's where #36424 (or #34258) come in. |
36403: storage: rate-limit AddSST requests r=lucy-zhang a=lucy-zhang We've been seeing extremely high latency for foreground traffic during bulk index backfills, because AddSST requests into non-empty ranges can be expensive, and write requests that are queued behind an AddSST request for an overlapping span can get stuck waiting for multiple seconds. This PR limits the number of concurrent AddSST requests for a single store, determined by a new cluster setting, `kv.bulk_io_write.concurrent_addsstable_requests`, to decrease the impact of index backfills on foreground writes. (It also decreases the risk of writing too many L0 files to RocksDB at once, which causes stalls.) Fixes #36430 Release note (general change): Add a new cluster setting, `kv.bulk_io_write.concurrent_addsstable_requests`, which limits the number of SSTables that can be added concurrently during bulk operations. 36436: roachtest: handle duplicates in cdc/schemareg r=nvanbenschoten a=danhhz There are various internal races and retries in changefeeds that can produce duplicates. This test is really only to verify that the confluent schema registry works end-to-end, so do the simplest thing and sort + unique the output. Closes #36409 Release note: None Co-authored-by: Lucy Zhang <[email protected]> Co-authored-by: Daniel Harrison <[email protected]>
I'm reopening this issue because it didn't address latency concerns from #34744 |
try |
Still there: #34744 (comment) |
36735: storage: limit number of AddSSTable requests per second r=lucy-zhang a=lucy-zhang Add rate limiting on the number of AddSSTable requests per second to a store, to allow slowing down index backfills when they're impacting foreground traffic. Closes #36430 Release note: None Co-authored-by: Lucy Zhang <[email protected]>
I ran the same setup and created two indexes: CREATE INDEX, DROP INDEX, CREATE INDEX with the new setting As you can see the index creation creates a spike in latency at the start but then it levels down to a reasonable level. This issue is no longer a release blocker. The spike in latency will be addressed in a different issue. |
So which of those values are we going to use as the default? What do these rate limits do to the time taken to complete the schema changes? |
Picking a good default value here is going to be hard. What this should be set to depends so heavily on the cluster, the other traffic and the data distribution being indexed. The cluster itself is a big one: I think the reason we've seen this more in under-provisioned or nearer-the-limit clusters is available capacity -- in disk io and cpu -- for the compactor to keep up with the backfill. How much compacting is required to keep up with the backfill also depends on the ssts we add. That comes down to the specific data distribution being indexed and how it ends up being chunked up: a single SST -- the unit this is limiting -- could be 100kb or it could be 16mb. Perhaps more importantly, it could span nothing -- and go straight to l6 -- or it could span lots of other backfill SSTs and recent online writes, and be forced to L0, trigger a memtable flush, and overlap everything. Existing traffic is a factor too in that it is using some capacity matters and overlapping with the backfill matters: both in how it affects the backfill above, but also in how much of the online traffic is affected by the backfill. We could pick something we think is a conservative number, but if we picked something we knew was safe on some hardware/data, I could easily see that ending up being too harsh on small/cheap ssts on different hardware. This is the same reason we haven't picked a max_write_rate default: the right value depends on many things, and absent something smart enough to figure it out, we need the operator to choose. Picking the slowest we think we could ever deploy on -- which we rejected for that setting already -- is even harder here than it was when we rejected that idea the first time, since we have to consider the added lack of predictability of the SSTs we need to ingest. |
After talking with @lucy-zhang I'm not sure if the knob is doing what we hoped it would do as seen in #34744 (comment). I'm re-opening this issue until we can determine what exactly is going on |
I was able to replicate @vivekmenezes's success in running with |
I'm removing this from SQL Schema because it seems in the same category of backfill-related performance concerns as, e.g., #47215 (and maybe this issue is obsolete anyway given the last few months of progress and planned work for 20.2), but if anyone disagrees then feel free to put it back. |
We have marked this issue as stale because it has been inactive for |
We should add a knob to allow for control of schema change speed in 19.1.
We should also use this knob to slow down schema changes in 19.1 to address issues like #34744, #36385, and others preemptively.
In 19.2, we can turn the knob back up after addressing CPU spike (#34744) and other stability concerns.
Without this knob, users can only turn off this implementation if they run into problems. Since we expect this implementation to be better than 2.1's overall, it would be a shame to need to direct users back to 2.1.
Epic CRDB-8816
Jira issue: CRDB-4499
The text was updated successfully, but these errors were encountered: