Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: benchmark "offline backfills" #54955

Closed
thoszhang opened this issue Sep 29, 2020 · 2 comments
Closed

sql: benchmark "offline backfills" #54955

thoszhang opened this issue Sep 29, 2020 · 2 comments
Labels
A-schema-changes C-investigation Further steps needed to qualify. C-label will change. T-disaster-recovery

Comments

@thoszhang
Copy link
Contributor

thoszhang commented Sep 29, 2020

See #36850. We know creating an index has a performance impact but we don't fully understand why or what will solve it. We can run some tests to figure out what is happening and what will help, and how much "offline backfills" would help or if it is something else.

Updated suggestions, Spring 2021:

One theory for what causes SQL performance degradation is that it is thundering herd of traffic to a single un-split index range that cannot handle that much traffic, as opposed to the backfill itself. We can test that theory! Create an index and immediately pause the create index job, so it doesn't backfill. Does the performance crater? If so, then it isn't the backfill and we know it is indeed about making the SQL traffic hit this new index, not how we're filling it.

If this is true, once we know it is about making the SQL traffic hit the new index and not the index backfill, we can keep narrowing down why that happens. If we can, we might then try splitting that (empty) index span by hand, if we can come up with some plausible index keys. Select some rows from the table, use those to call SPLIT AT, and see: does that make it happy again? If so, boom, we now have proof it is a) that SQL traffic load has to hit the new index span that is b) under-split for that load. Time to figure out how to split it (which may or may not be to pre-backfill it, which would cause size splitting to happen. while that might be one way to get that span split, there may be others as well). OTOH, if not-backfilling still shows import but the splitting doesn't help, that would be very revealing too. And indeed, it suggests that pre-backfill won't help at all.

Now if the performance doesn't crater just from having the new index in DELETE_AND_WRITE_ONLY, but then does when we unpause the backfill, well then, that is also enormously informative: now we know we need to look at the backfill itself - quotapool, addsstable, etc. This would be where #36850 could potentially be the only solution or at least easiest (despite its size) solution. In this case, we could then partially validate that it truly is the interaction of the backfill and the SQL traffic by having two identical tables side-by-side, and send SQL load to one -- with a paused index backfill as described above -- while backfilling the other copy, which is seeing no load. Does backfilling the unused second table affect the online table? If so, then 36850 may not have much impact, since this test is more or less modeling the same thing it will: doing the backfill to one span while sql uses a different span.

Jira issue: CRDB-3701

@thoszhang thoszhang added C-investigation Further steps needed to qualify. C-label will change. A-schema-changes labels Sep 29, 2020
@thoszhang
Copy link
Contributor Author

Quoting @dt:

As a no-code-change benchmark, I’d just setup two tpcc copies, in separate DBs, one with load the other without and build indexes on one then the other

And doc build time plus observed latency during and after latency the backfill, both during backfill for the one that is loaded and backfill of the other

That’ll tell us if backfill to unrelated table affects the loaded table or not/how much. And delta in build time would be interesting if there is one too.

From there I might also try just desc surgery-ing and index into a loaded table, without backfilling it, to measure in isolation the 1pc hit

@dt
Copy link
Member

dt commented Oct 9, 2020

Just to be explicit, I meant “separate DBs in the same cluster” there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-investigation Further steps needed to qualify. C-label will change. T-disaster-recovery
Projects
None yet
Development

No branches or pull requests

5 participants