sql: benchmark "offline backfills" #54955

thoszhang · 2020-09-29T16:51:32Z

See #36850. We know creating an index has a performance impact but we don't fully understand why or what will solve it. We can run some tests to figure out what is happening and what will help, and how much "offline backfills" would help or if it is something else.

Updated suggestions, Spring 2021:

One theory for what causes SQL performance degradation is that it is thundering herd of traffic to a single un-split index range that cannot handle that much traffic, as opposed to the backfill itself. We can test that theory! Create an index and immediately pause the create index job, so it doesn't backfill. Does the performance crater? If so, then it isn't the backfill and we know it is indeed about making the SQL traffic hit this new index, not how we're filling it.

If this is true, once we know it is about making the SQL traffic hit the new index and not the index backfill, we can keep narrowing down why that happens. If we can, we might then try splitting that (empty) index span by hand, if we can come up with some plausible index keys. Select some rows from the table, use those to call SPLIT AT, and see: does that make it happy again? If so, boom, we now have proof it is a) that SQL traffic load has to hit the new index span that is b) under-split for that load. Time to figure out how to split it (which may or may not be to pre-backfill it, which would cause size splitting to happen. while that might be one way to get that span split, there may be others as well). OTOH, if not-backfilling still shows import but the splitting doesn't help, that would be very revealing too. And indeed, it suggests that pre-backfill won't help at all.

Now if the performance doesn't crater just from having the new index in DELETE_AND_WRITE_ONLY, but then does when we unpause the backfill, well then, that is also enormously informative: now we know we need to look at the backfill itself - quotapool, addsstable, etc. This would be where #36850 could potentially be the only solution or at least easiest (despite its size) solution. In this case, we could then partially validate that it truly is the interaction of the backfill and the SQL traffic by having two identical tables side-by-side, and send SQL load to one -- with a paused index backfill as described above -- while backfilling the other copy, which is seeing no load. Does backfilling the unused second table affect the online table? If so, then 36850 may not have much impact, since this test is more or less modeling the same thing it will: doing the backfill to one span while sql uses a different span.

Jira issue: CRDB-3701

thoszhang · 2020-10-09T20:05:00Z

Quoting @dt:

As a no-code-change benchmark, I’d just setup two tpcc copies, in separate DBs, one with load the other without and build indexes on one then the other

And doc build time plus observed latency during and after latency the backfill, both during backfill for the one that is loaded and backfill of the other

That’ll tell us if backfill to unrelated table affects the loaded table or not/how much. And delta in build time would be interesting if there is one too.

From there I might also try just desc surgery-ing and index into a loaded table, without backfilling it, to measure in isolation the 1pc hit

dt · 2020-10-09T21:00:48Z

Just to be explicit, I meant “separate DBs in the same cluster” there

thoszhang added C-investigation Further steps needed to qualify. C-label will change. A-schema-changes labels Sep 29, 2020

thoszhang assigned arulajmani Oct 9, 2020

kenliu added the T-disaster-recovery label Dec 5, 2020

dt assigned aayushshah15 and unassigned arulajmani May 28, 2021

dt mentioned this issue May 28, 2021

kv/sql: index backfill and truncate temporarily crater write throughput on large tables #62672

Closed

aayushshah15 removed their assignment Dec 23, 2021

dt closed this as completed Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: benchmark "offline backfills" #54955

sql: benchmark "offline backfills" #54955

thoszhang commented Sep 29, 2020 •

edited by cockroach-jira-scripts

Loading

thoszhang commented Oct 9, 2020

dt commented Oct 9, 2020

sql: benchmark "offline backfills" #54955

sql: benchmark "offline backfills" #54955

Comments

thoszhang commented Sep 29, 2020 • edited by cockroach-jira-scripts Loading

thoszhang commented Oct 9, 2020

dt commented Oct 9, 2020

thoszhang commented Sep 29, 2020 •

edited by cockroach-jira-scripts

Loading