kvserver: avoid load based splits in middle of SQL row #103690

kvoli · 2023-05-20T01:26:53Z

It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with just the sampled request keys, whether a
key is certainly unsafe or safe, so a split key is returned regardless of error.

This PR side steps this problem by re-using the
adminSplitWithDescriptor command to find the first real key, after or
at the provided args.SplitKey. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated adminSplitWithDescriptor is local only and requires opting
into finding the first safe key by setting findFirstSafeKey to true.

As such, all safe split key checks are also removed from the split
pkg, with a warning added that the any split key returned is unsafe.

Note that the weighted load based split finder, used for CPU splits
did not suffer from returning potentially unsafe splits due to e4f003b.

However, it was possible that no load-based split key was ever found
when using the weighted finder. This was because we discard potentially
unsafe samples, which could have been safe split points.

This PR reverts commit e4f003b, as the
safe split key is enforced elsewhere, mentioned above.

Resolves: #103483
Resolves: #103672

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now sample the real
keys, rather than just request keys to determine load-based split points.

cockroach-teamcity · 2023-05-20T01:27:04Z

This change is

miraradeva

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, @renatolabs, and @smg260)

pkg/kv/kvserver/replica_split_load.go line 263 at r3 (raw file):

// NOTE: The returned split key CAN BE BETWEEN A SQL ROW, The split key
// returned should only be used to engage a split via adminSplitWithDescriptor
// where findNextSafeKey is set to true.

Is it possible/easy to reproduce this bug in a unit test?

Feel free to ignore if it's not relevant; I'm just lurking and learning more about splits.

Code quote:

// NOTE: The returned split key CAN BE BETWEEN A SQL ROW, The split key
// returned should only be used to engage a split via adminSplitWithDescriptor
// where findNextSafeKey is set to true.

kvoli

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva, @nvanbenschoten, @renatolabs, and @smg260)

pkg/kv/kvserver/replica_split_load.go line 263 at r3 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Is it possible/easy to reproduce this bug in a unit test?

Feel free to ignore if it's not relevant; I'm just lurking and learning more about splits.

Certainly possible. I'm writing up a test like this now.

It was possible that no load-based split key was ever found when using the weighted finder. This was because we discard potentially unsafe samples, which could have been safe split points. This reverts commit e4f003b. Resolves: cockroachdb#103672

nvanbenschoten

Reviewed 3 of 3 files at r11, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @kvoli, @miraradeva, and @sumeerbhola)

pkg/kv/kvserver/replica_split_load.go line 286 at r6 (raw file):

Previously, kvoli (Austen) wrote…

Why would we want to start searching from the row prefix? [...] the request key may not exist in the range but other column families in the same row do exist, and that searching from the request key may lead to splitting after it, instead of before it?

It is solely for the reason you gave, where all the keys for the row are <= the sampled split key.

Could we update the comment to mention that?

pkg/storage/mvcc.go line 5963 at r10 (raw file):

Split at f, RHS [f,k] will have 100% of the load.

After this split, we will have carved off a cold range [a,f) from a hot range [f,k] and failed to bisect the load. However, we do expect that the range will attempt to load-based split again. When it does, it will fall into the "L is empty" case, and will split at g. Is that correct?

pkg/storage/mvcc.go line 5934 at r11 (raw file):

// the range which spans [startKey,endKey). The returned split key is safe,
// meaning it is guaranteed to (1) not be a column family key within a table
// row and be after the first table row in the range. For non-table ranges, the

I think you wanted "and (2) be" here.

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @kvoli, @miraradeva, and @nvanbenschoten)

pkg/kv/kvserver/replica_command.go line 355 at r10 (raw file):

Previously, kvoli (Austen) wrote…

On second thought, it may be better to not include the change in this PR. I already overloaded it with a few other things.

Ack

pkg/storage/mvcc.go line 5963 at r10 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Split at f, RHS [f,k] will have 100% of the load.

After this split, we will have carved off a cold range [a,f) from a hot range [f,k] and failed to bisect the load. However, we do expect that the range will attempt to load-based split again. When it does, it will fall into the "L is empty" case, and will split at g. Is that correct?

btw, I was not clear in my earlier example. I meant the initial f<i> to be keys that are not just unsafe, but also get an error when passed to EnsureSafeSplitKey. So replica.loadSplitKey would have to pass it through to the code here.

I don't think this fact changes the general problem, which you elaborated on with more examples. And it also seems we are in agreement that if we do end up splitting off a cold range, the next split will likely correct the behavior. So we'll end up with 3 ranges instead of 2.

kvoli

TYFTR, Sumeer mentioned over slack that his review should not block.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @miraradeva, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/replica_split_load.go line 286 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could we update the comment to mention that?

Done.

pkg/storage/mvcc.go line 5963 at r10 (raw file):

When it does, it will fall into the "L is empty" case, and will split at g. Is that correct?

Yes. There will be 2 splits (3 ranges) r0[a,z) => r0[a,f) r1[f, z) => r0[a,f) r1[f,g) r2[g,z) instead of 2. Assuming no changes in requests, this should occur after 10 seconds and the first split [a,f). It is a bit wasteful though, because we split the range's load stats evenly over the LHS and RHS - so it is probable the LHS [a,f) gets picked up for rebalancing and becomes a pain to merge (if [a,f) + [f,g) < 0.5 * split threshold), with wasted snapshots realigning the replicas.

I don't think this fact changes the general problem, which you elaborated on with more examples. And it also seems we are in agreement that if we do end up splitting off a cold range, the next split will likely correct the behavior. So we'll end up with 3 ranges instead of 2.

Agreed.

pkg/storage/mvcc.go line 5934 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think you wanted "and (2) be" here.

Fixed and reworded.

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

Non-blocking review, discussed over slack. Necessary to merge.

kvoli · 2023-05-25T06:09:25Z

bors r=nvanbenschoten

craig · 2023-05-25T07:05:23Z

Build succeeded:

Bazel Essential CI (Cockroach)

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/mvcc.go line 5963 at r10 (raw file):

I think I missed something. In the case you mentioned

If the LB desired key is f<i>, i > 11 and R is empty:

f will end up being the last key in the LHS, and we will fail to split f. IIUC, we don't have any logic to step back from the last key the way we step forward from the fist key (in mvccMinSplitKey), so this failure may not be fixed by another split in the future.

I don't know if this is mostly a theoretical concern. #103672 mentions

Table key with one or more .Next() 0x00 appended

which could satisfy the case where it is past all the valid f keys, and can't be successfully passed through EnsureSafeSplitKey. Does this kind of thing occur due to reverse scans (which call SeekLT)?

kvoli

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/mvcc.go line 5963 at r10 (raw file):

Previously, sumeerbhola wrote…

I think I missed something. In the case you mentioned

If the LB desired key is f<i>, i > 11 and R is empty:

f will end up being the last key in the LHS, and we will fail to split f. IIUC, we don't have any logic to step back from the last key the way we step forward from the fist key (in mvccMinSplitKey), so this failure may not be fixed by another split in the future.

I don't know if this is mostly a theoretical concern. #103672 mentions

Table key with one or more .Next() 0x00 appended

which could satisfy the case where it is past all the valid f keys, and can't be successfully passed through EnsureSafeSplitKey. Does this kind of thing occur due to reverse scans (which call SeekLT)?

I'm only aware of one support related issue where we spit inbetween a row, prompting #103672.

If the LB split key was indeed unsafe due to one or more .Next() 0x00 appended, then the split would be acted upon as is prior to this PR (ignoring the safe sample only retention).

This leads me to believe that type of sample has been rare in the past, given the original issue is from 2019.

On a related note, this class of issues where the R is empty can still exist without needing unsafe keys.

Consider the kv workload schema, which has 10 keys all belonging to a single table range [/Table/106/1, /Max].

I added additional logging for span sampling events, even if we didn't sample the key.

set tracing = on; select count(v) from [select v from kv order by k]; set tracing = off;
select message from [show trace for session] where message ~ 'executing (Get|Scan)' or message ~ 'sampling';
                                     message
---------------------------------------------------------------------------------
  executing Scan [/Table/106/1,/Table/106/2), [txn: 0c8c2532], [can-forward-ts]
  sampling /Table/106/{1-2}
(2 rows)

Time: 1ms total (execution 1ms / network 0ms)

If the workload contains only scans for /Table/106/1 and /Table/106/2, we will suggest a spliit at /Table/106/2 in the weighted finder and never suggest a split in the unweighted finder (it would amplify QPS).

The problem is, R is empty by definition, since /Table/106/2 must sort after all keys in the table (only 1 index). Note that we use the union of spans the request iterated over (ca76c28) - so if the batch hits a limit and has a resumeSpan, we are unlikely to run into this problem

Passing percentiles and/or sampling more than the union of spans iterated over seems necessary.

In any case, this was desired behavior pre 23.1. There are load based rebalancing tests, such as allocbench which use some mix of spanning requests and split reasonably well. For a workload with only full table scans on a range bytes < resume threshold, we will never split - evidenced by the small number of splits in #103872 for splits/load/spanning/nodes=3/obj=cpu - which had the above pattern when investigating.

The load based splitting behavior was changed in cockroachdb#103690 to always find a real key to split at, rather than use sampled response spans as split points. This behavior change affected the number of splits in `split/load` roachtests. Update the min and max boundaries for various `split/load` roachtests to conform to the new expected behavior. Fixes: cockroachdb#103691 Fixes: cockroachdb#103935 Resolves: cockroachdb#103872 Release note: None

103985: roachtest: update activerecord adapter version r=rafiss a=andyyang890 This patch updates the version of the CockroachDB activerecord adapter used for testing to v7.0.2. Informs #97283 Informs #99620 Release note: None 104084: roachtest: bump split boundaries to prevent flakes r=tbg a=kvoli The load based splitting behavior was changed in #103690 to always find a real key to split at, rather than use sampled response spans as split points. This behavior change affected the number of splits in `split/load` roachtests. Update the min and max boundaries for various `split/load` roachtests to conform to the new expected behavior. See #103872 for updated split distribution over 190 test runs. Fixes: #103691 Fixes: #103935 Resolves: #103872 Release note: None Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Austen McClernon <[email protected]>

The load based splitting behavior was changed in #103690 to always find a real key to split at, rather than use sampled response spans as split points. This behavior change affected the number of splits in `split/load` roachtests. Update the min and max boundaries for various `split/load` roachtests to conform to the new expected behavior. Fixes: #103691 Fixes: #103935 Resolves: #103872 Release note: None

kvoli added the backport-23.1.x Flags PRs that need to be backported to 23.1 label May 20, 2023

kvoli requested a review from a team as a code owner May 20, 2023 01:26

kvoli self-assigned this May 20, 2023

kvoli force-pushed the 230517.unsafe-split branch from 8aa9339 to 5f3661f Compare May 20, 2023 01:36

kvoli requested a review from nvanbenschoten May 20, 2023 01:41

kvoli changed the title ~~split: add string methods for debugging/logging~~ kvserver: avoid load based splits which occur within a row May 20, 2023

kvoli changed the title ~~kvserver: avoid load based splits which occur within a row~~ kvserver: avoid load based splits in middle of SQL row May 20, 2023

kvoli force-pushed the 230517.unsafe-split branch 5 times, most recently from f3c1f70 to 788e5b3 Compare May 22, 2023 13:44

kvoli requested a review from a team as a code owner May 22, 2023 13:44

kvoli requested review from smg260 and renatolabs and removed request for a team May 22, 2023 13:44

kvoli force-pushed the 230517.unsafe-split branch from 788e5b3 to b341eb0 Compare May 22, 2023 13:55

miraradeva reviewed May 22, 2023

View reviewed changes

kvoli marked this pull request as draft May 22, 2023 16:51

kvoli requested a review from miraradeva May 22, 2023 16:52

kvoli commented May 22, 2023

View reviewed changes

kvoli force-pushed the 230517.unsafe-split branch from b341eb0 to f8848f9 Compare May 22, 2023 21:59

kvoli marked this pull request as ready for review May 22, 2023 22:01

kvoli force-pushed the 230517.unsafe-split branch from f8848f9 to 36e00b5 Compare May 22, 2023 22:09

kvoli mentioned this pull request May 22, 2023

split: [DNM] only retain safe split keys #103565

Closed

kvoli requested a review from sumeerbhola May 22, 2023 22:19

kvoli force-pushed the 230517.unsafe-split branch 3 times, most recently from 96e8f6c to 357fb7c Compare May 23, 2023 13:17

kvoli force-pushed the 230517.unsafe-split branch from cfc8b3d to dbfe79a Compare May 24, 2023 22:15

nvanbenschoten approved these changes May 24, 2023

View reviewed changes

sumeerbhola reviewed May 25, 2023

View reviewed changes

kvoli mentioned this pull request May 25, 2023

roachtest: adjust split/load boundaries to prevent flakes #103872

Closed

kvoli force-pushed the 230517.unsafe-split branch from dbfe79a to a145013 Compare May 25, 2023 03:05

kvoli requested review from nvanbenschoten and sumeerbhola May 25, 2023 03:07

kvoli commented May 25, 2023

View reviewed changes

kvoli force-pushed the 230517.unsafe-split branch from a145013 to bf2aa42 Compare May 25, 2023 03:09

craig bot merged commit 562f083 into cockroachdb:master May 25, 2023

blathers-crl bot mentioned this pull request May 25, 2023

release-23.1: kvserver: avoid load based splits in middle of SQL row #103876

Merged

sumeerbhola reviewed May 25, 2023

View reviewed changes

kvoli commented May 25, 2023

View reviewed changes

This was referenced May 26, 2023

roachtest: splits/load/spanning/nodes=3/obj=cpu failed #103934

Closed

roachtest: splits/load/ycsb/b/nodes=3/obj=cpu failed #103935

Closed

erikgrinaker mentioned this pull request May 27, 2023

kvserver: iterator panic during split #104007

Closed

kvoli mentioned this pull request May 30, 2023

roachtest: bump split boundaries to prevent flakes #104084

Merged

blathers-crl bot mentioned this pull request May 30, 2023

release-23.1: roachtest: bump split boundaries to prevent flakes #104107

Merged

This was referenced Jun 5, 2023

kv: load-based splitter permits splits between SQL rows backport tracker #104353

Closed

release-22.2: kvserver: avoid load based splits in middle of SQL row #104563

Merged

knz mentioned this pull request Jun 12, 2023

kvserver: v23.1.2-custom: failed to decode tenant prefix from key for replica: insufficient bytes to decode uvarint value #104606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: avoid load based splits in middle of SQL row #103690

kvserver: avoid load based splits in middle of SQL row #103690

kvoli commented May 20, 2023 •

edited

Loading

cockroach-teamcity commented May 20, 2023

miraradeva left a comment

kvoli left a comment

nvanbenschoten left a comment

sumeerbhola left a comment

kvoli left a comment

kvoli commented May 25, 2023

craig bot commented May 25, 2023

sumeerbhola left a comment

kvoli left a comment

kvserver: avoid load based splits in middle of SQL row #103690

kvserver: avoid load based splits in middle of SQL row #103690

Conversation

kvoli commented May 20, 2023 • edited Loading

cockroach-teamcity commented May 20, 2023

miraradeva left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

kvoli commented May 25, 2023

craig bot commented May 25, 2023

sumeerbhola left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

kvoli commented May 20, 2023 •

edited

Loading