Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema: add a knob to control the speed of schema changes #36430

Closed
awoods187 opened this issue Apr 2, 2019 · 11 comments
Closed

schema: add a knob to control the speed of schema changes #36430

awoods187 opened this issue Apr 2, 2019 · 11 comments
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors T-disaster-recovery X-stale

Comments

@awoods187
Copy link
Contributor

awoods187 commented Apr 2, 2019

We should add a knob to allow for control of schema change speed in 19.1.

We should also use this knob to slow down schema changes in 19.1 to address issues like #34744, #36385, and others preemptively.

In 19.2, we can turn the knob back up after addressing CPU spike (#34744) and other stability concerns.

Without this knob, users can only turn off this implementation if they run into problems. Since we expect this implementation to be better than 2.1's overall, it would be a shame to need to direct users back to 2.1.

Epic CRDB-8816

Jira issue: CRDB-4499

@awoods187 awoods187 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 2, 2019
@dt
Copy link
Member

dt commented Apr 2, 2019

#36403 is the knob -- it limits, at the receiving node, the number of expensive SSTs we handle at once. We'll be merging and backporting ASAP, and then exploring tuning it in a later back ported patch if needed.

While that the knob, there's one case it doesn't cover where a follower of many ranges might get swapped with files, even if each of those ranges' leaders have the knob turned down, since while they each obeyed the knob, as a follower you could get files from all of them. For that, just getting and applying SSTs as a follower isn't that expensive though, so if we can keep rocksdb from freaking out about the number of files, we should be good. That's where #36424 (or #34258) come in.

craig bot pushed a commit that referenced this issue Apr 2, 2019
36403: storage: rate-limit AddSST requests r=lucy-zhang a=lucy-zhang

We've been seeing extremely high latency for foreground traffic during bulk
index backfills, because AddSST requests into non-empty ranges can be
expensive, and write requests that are queued behind an AddSST request for an
overlapping span can get stuck waiting for multiple seconds. This PR limits the
number of concurrent AddSST requests for a single store, determined by a new
cluster setting, `kv.bulk_io_write.concurrent_addsstable_requests`, to decrease
the impact of index backfills on foreground writes. (It also decreases the risk
of writing too many L0 files to RocksDB at once, which causes stalls.)

Fixes #36430

Release note (general change): Add a new cluster setting,
`kv.bulk_io_write.concurrent_addsstable_requests`, which limits the number of
SSTables that can be added concurrently during bulk operations.

36436: roachtest: handle duplicates in cdc/schemareg r=nvanbenschoten a=danhhz

There are various internal races and retries in changefeeds that can
produce duplicates. This test is really only to verify that the
confluent schema registry works end-to-end, so do the simplest thing and
sort + unique the output.

Closes #36409

Release note: None

Co-authored-by: Lucy Zhang <[email protected]>
Co-authored-by: Daniel Harrison <[email protected]>
@craig craig bot closed this as completed in #36403 Apr 2, 2019
@awoods187
Copy link
Contributor Author

I'm reopening this issue because it didn't address latency concerns from #34744

@awoods187 awoods187 reopened this Apr 4, 2019
@dt
Copy link
Member

dt commented Apr 4, 2019

try SET CLUSTER SETTING kv.bulk_io_write.max_rate = '10mb'

@awoods187
Copy link
Contributor Author

Still there: #34744 (comment)

craig bot pushed a commit that referenced this issue Apr 11, 2019
36735: storage: limit number of AddSSTable requests per second r=lucy-zhang a=lucy-zhang

Add rate limiting on the number of AddSSTable requests per second to a store,
to allow slowing down index backfills when they're impacting foreground
traffic.

Closes #36430

Release note: None

Co-authored-by: Lucy Zhang <[email protected]>
@craig craig bot closed this as completed in #36735 Apr 11, 2019
@vivekmenezes
Copy link
Contributor

Screen Shot 2019-04-15 at 4 08 26 PM

I ran the same setup and created two indexes: CREATE INDEX, DROP INDEX, CREATE INDEX with the new setting SET CLUSTER SETTING kv.bulk_io_write.addsstable_max_rate=0.1 for the first index and SET CLUSTER SETTING kv.bulk_io_write.addsstable_max_rate=0.03 for the second index.

As you can see the index creation creates a spike in latency at the start but then it levels down to a reasonable level.

This issue is no longer a release blocker. The spike in latency will be addressed in a different issue.

@bdarnell
Copy link
Contributor

So which of those values are we going to use as the default? What do these rate limits do to the time taken to complete the schema changes?

@dt
Copy link
Member

dt commented Apr 15, 2019

Picking a good default value here is going to be hard.

What this should be set to depends so heavily on the cluster, the other traffic and the data distribution being indexed.

The cluster itself is a big one: I think the reason we've seen this more in under-provisioned or nearer-the-limit clusters is available capacity -- in disk io and cpu -- for the compactor to keep up with the backfill.

How much compacting is required to keep up with the backfill also depends on the ssts we add. That comes down to the specific data distribution being indexed and how it ends up being chunked up: a single SST -- the unit this is limiting -- could be 100kb or it could be 16mb. Perhaps more importantly, it could span nothing -- and go straight to l6 -- or it could span lots of other backfill SSTs and recent online writes, and be forced to L0, trigger a memtable flush, and overlap everything.

Existing traffic is a factor too in that it is using some capacity matters and overlapping with the backfill matters: both in how it affects the backfill above, but also in how much of the online traffic is affected by the backfill.

We could pick something we think is a conservative number, but if we picked something we knew was safe on some hardware/data, I could easily see that ending up being too harsh on small/cheap ssts on different hardware. This is the same reason we haven't picked a max_write_rate default: the right value depends on many things, and absent something smart enough to figure it out, we need the operator to choose. Picking the slowest we think we could ever deploy on -- which we rejected for that setting already -- is even harder here than it was when we rejected that idea the first time, since we have to consider the added lack of predictability of the SSTs we need to ingest.

@awoods187
Copy link
Contributor Author

After talking with @lucy-zhang I'm not sure if the knob is doing what we hoped it would do as seen in #34744 (comment). I'm re-opening this issue until we can determine what exactly is going on

@awoods187 awoods187 reopened this Apr 16, 2019
@awoods187
Copy link
Contributor Author

I was able to replicate @vivekmenezes's success in running with SET CLUSTER SETTING kv.bulk_io_write.addsstable_max_rate=.01. I suspect that this knob only really works at the beginning of the schema change based on my testing #34744 (comment). However, the schema change is much slower. In fact, it's slower than the old index backfill implementation. I think this knob is therefore not as effective as we had hoped. It seems like the current flow for a customer experiencing a schema change concern will be to cancel the job and add capacity. At the problems the new index backfill runs into, the old index backfill isn't much better.

@thoszhang
Copy link
Contributor

I'm removing this from SQL Schema because it seems in the same category of backfill-related performance concerns as, e.g., #47215 (and maybe this issue is obsolete anyway given the last few months of progress and planned work for 20.2), but if anyone disagrees then feel free to put it back.

@thoszhang thoszhang removed their assignment Apr 28, 2020
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors T-disaster-recovery X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants