Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: avoid load based splits in middle of SQL row #103690

Merged
merged 3 commits into from
May 25, 2023

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented May 20, 2023

It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with just the sampled request keys, whether a
key is certainly unsafe or safe, so a split key is returned regardless of error.

This PR side steps this problem by re-using the
adminSplitWithDescriptor command to find the first real key, after or
at the provided args.SplitKey. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated adminSplitWithDescriptor is local only and requires opting
into finding the first safe key by setting findFirstSafeKey to true.

As such, all safe split key checks are also removed from the split
pkg, with a warning added that the any split key returned is unsafe.

Note that the weighted load based split finder, used for CPU splits
did not suffer from returning potentially unsafe splits due to e4f003b.

However, it was possible that no load-based split key was ever found
when using the weighted finder. This was because we discard potentially
unsafe samples, which could have been safe split points.

This PR reverts commit e4f003b, as the
safe split key is enforced elsewhere, mentioned above.

Resolves: #103483
Resolves: #103672

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now sample the real
keys, rather than just request keys to determine load-based split points.

@kvoli kvoli added the backport-23.1.x Flags PRs that need to be backported to 23.1 label May 20, 2023
@kvoli kvoli requested a review from a team as a code owner May 20, 2023 01:26
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli self-assigned this May 20, 2023
@kvoli kvoli force-pushed the 230517.unsafe-split branch from 8aa9339 to 5f3661f Compare May 20, 2023 01:36
@kvoli kvoli requested a review from nvanbenschoten May 20, 2023 01:41
@kvoli kvoli changed the title split: add string methods for debugging/logging kvserver: avoid load based splits which occur within a row May 20, 2023
@kvoli kvoli changed the title kvserver: avoid load based splits which occur within a row kvserver: avoid load based splits in middle of SQL row May 20, 2023
@kvoli kvoli force-pushed the 230517.unsafe-split branch 5 times, most recently from f3c1f70 to 788e5b3 Compare May 22, 2023 13:44
@kvoli kvoli requested a review from a team as a code owner May 22, 2023 13:44
@kvoli kvoli requested review from smg260 and renatolabs and removed request for a team May 22, 2023 13:44
@kvoli kvoli force-pushed the 230517.unsafe-split branch from 788e5b3 to b341eb0 Compare May 22, 2023 13:55
Copy link
Contributor

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, @renatolabs, and @smg260)


pkg/kv/kvserver/replica_split_load.go line 263 at r3 (raw file):

// NOTE: The returned split key CAN BE BETWEEN A SQL ROW, The split key
// returned should only be used to engage a split via adminSplitWithDescriptor
// where findNextSafeKey is set to true.

Is it possible/easy to reproduce this bug in a unit test?

Feel free to ignore if it's not relevant; I'm just lurking and learning more about splits.

Code quote:

// NOTE: The returned split key CAN BE BETWEEN A SQL ROW, The split key
// returned should only be used to engage a split via adminSplitWithDescriptor
// where findNextSafeKey is set to true.

@kvoli kvoli marked this pull request as draft May 22, 2023 16:51
@kvoli kvoli requested a review from miraradeva May 22, 2023 16:52
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva, @nvanbenschoten, @renatolabs, and @smg260)


pkg/kv/kvserver/replica_split_load.go line 263 at r3 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Is it possible/easy to reproduce this bug in a unit test?

Feel free to ignore if it's not relevant; I'm just lurking and learning more about splits.

Certainly possible. I'm writing up a test like this now.

@kvoli kvoli force-pushed the 230517.unsafe-split branch from b341eb0 to f8848f9 Compare May 22, 2023 21:59
@kvoli kvoli marked this pull request as ready for review May 22, 2023 22:01
@kvoli kvoli force-pushed the 230517.unsafe-split branch from f8848f9 to 36e00b5 Compare May 22, 2023 22:09
@kvoli kvoli requested a review from sumeerbhola May 22, 2023 22:19
@kvoli kvoli force-pushed the 230517.unsafe-split branch 3 times, most recently from 96e8f6c to 357fb7c Compare May 23, 2023 13:17
It was possible that no load-based split key was ever found when using
the weighted finder. This was because we discard potentially unsafe
samples, which could have been safe split points.

This reverts commit e4f003b.

Resolves: cockroachdb#103672
@kvoli kvoli force-pushed the 230517.unsafe-split branch from cfc8b3d to dbfe79a Compare May 24, 2023 22:15
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @kvoli, @miraradeva, and @sumeerbhola)


pkg/kv/kvserver/replica_split_load.go line 286 at r6 (raw file):

Previously, kvoli (Austen) wrote…

Why would we want to start searching from the row prefix? [...] the request key may not exist in the range but other column families in the same row do exist, and that searching from the request key may lead to splitting after it, instead of before it?

It is solely for the reason you gave, where all the keys for the row are <= the sampled split key.

Could we update the comment to mention that?


pkg/storage/mvcc.go line 5963 at r10 (raw file):

Split at f, RHS [f,k] will have 100% of the load.

After this split, we will have carved off a cold range [a,f) from a hot range [f,k] and failed to bisect the load. However, we do expect that the range will attempt to load-based split again. When it does, it will fall into the "L is empty" case, and will split at g. Is that correct?


pkg/storage/mvcc.go line 5934 at r11 (raw file):

// the range which spans [startKey,endKey). The returned split key is safe,
// meaning it is guaranteed to (1) not be a column family key within a table
// row and be after the first table row in the range. For non-table ranges, the

I think you wanted "and (2) be" here.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @kvoli, @miraradeva, and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go line 355 at r10 (raw file):

Previously, kvoli (Austen) wrote…

On second thought, it may be better to not include the change in this PR. I already overloaded it with a few other things.

Ack


pkg/storage/mvcc.go line 5963 at r10 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Split at f, RHS [f,k] will have 100% of the load.

After this split, we will have carved off a cold range [a,f) from a hot range [f,k] and failed to bisect the load. However, we do expect that the range will attempt to load-based split again. When it does, it will fall into the "L is empty" case, and will split at g. Is that correct?

btw, I was not clear in my earlier example. I meant the initial f<i> to be keys that are not just unsafe, but also get an error when passed to EnsureSafeSplitKey. So replica.loadSplitKey would have to pass it through to the code here.

I don't think this fact changes the general problem, which you elaborated on with more examples. And it also seems we are in agreement that if we do end up splitting off a cold range, the next split will likely correct the behavior. So we'll end up with 3 ranges instead of 2.

Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR, Sumeer mentioned over slack that his review should not block.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @miraradeva, @nvanbenschoten, and @sumeerbhola)


pkg/kv/kvserver/replica_split_load.go line 286 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could we update the comment to mention that?

Done.


pkg/storage/mvcc.go line 5963 at r10 (raw file):

When it does, it will fall into the "L is empty" case, and will split at g. Is that correct?

Yes. There will be 2 splits (3 ranges) r0[a,z) => r0[a,f) r1[f, z) => r0[a,f) r1[f,g) r2[g,z) instead of 2. Assuming no changes in requests, this should occur after 10 seconds and the first split [a,f). It is a bit wasteful though, because we split the range's load stats evenly over the LHS and RHS - so it is probable the LHS [a,f) gets picked up for rebalancing and becomes a pain to merge (if [a,f) + [f,g) < 0.5 * split threshold), with wasted snapshots realigning the replicas.

I don't think this fact changes the general problem, which you elaborated on with more examples. And it also seems we are in agreement that if we do end up splitting off a cold range, the next split will likely correct the behavior. So we'll end up with 3 ranges instead of 2.

Agreed.


pkg/storage/mvcc.go line 5934 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think you wanted "and (2) be" here.

Fixed and reworded.

It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with keys sampled from response spans, whether a
key is certainly unsafe or safe.

This commit side steps this problem by re-using the
`adminSplitWithDescriptor` command to find the first real key, after or
at the provided `args.SplitKey`. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated `adminSplitWithDescriptor` is local only and requires opting
into finding the first safe key by setting `findFirstSafeKey` to `true`.

As such, all safe split key checks are also removed from the `split`
pkg, with a warning added that the any split key returned is unsafe.

Resolves: cockroachdb#103483

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now inspect the real
keys, rather than just request keys to determine load-based split points.
@kvoli kvoli force-pushed the 230517.unsafe-split branch from a145013 to bf2aa42 Compare May 25, 2023 03:09
@kvoli kvoli dismissed sumeerbhola’s stale review May 25, 2023 06:09

Non-blocking review, discussed over slack. Necessary to merge.

@kvoli
Copy link
Collaborator Author

kvoli commented May 25, 2023

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented May 25, 2023

Build succeeded:

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/mvcc.go line 5963 at r10 (raw file):

I think I missed something. In the case you mentioned

If the LB desired key is f<i>, i > 11 and R is empty:

f will end up being the last key in the LHS, and we will fail to split f. IIUC, we don't have any logic to step back from the last key the way we step forward from the fist key (in mvccMinSplitKey), so this failure may not be fixed by another split in the future.

I don't know if this is mostly a theoretical concern. #103672 mentions

Table key with one or more .Next() 0x00 appended

which could satisfy the case where it is past all the valid f keys, and can't be successfully passed through EnsureSafeSplitKey. Does this kind of thing occur due to reverse scans (which call SeekLT)?

Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/mvcc.go line 5963 at r10 (raw file):

Previously, sumeerbhola wrote…

I think I missed something. In the case you mentioned

If the LB desired key is f<i>, i > 11 and R is empty:

f will end up being the last key in the LHS, and we will fail to split f. IIUC, we don't have any logic to step back from the last key the way we step forward from the fist key (in mvccMinSplitKey), so this failure may not be fixed by another split in the future.

I don't know if this is mostly a theoretical concern. #103672 mentions

Table key with one or more .Next() 0x00 appended

which could satisfy the case where it is past all the valid f keys, and can't be successfully passed through EnsureSafeSplitKey. Does this kind of thing occur due to reverse scans (which call SeekLT)?

I'm only aware of one support related issue where we spit inbetween a row, prompting #103672.

If the LB split key was indeed unsafe due to one or more .Next() 0x00 appended, then the split would be acted upon as is prior to this PR (ignoring the safe sample only retention).

This leads me to believe that type of sample has been rare in the past, given the original issue is from 2019.

On a related note, this class of issues where the R is empty can still exist without needing unsafe keys.

Consider the kv workload schema, which has 10 keys all belonging to a single table range [/Table/106/1, /Max].

I added additional logging for span sampling events, even if we didn't sample the key.

set tracing = on; select count(v) from [select v from kv order by k]; set tracing = off;
select message from [show trace for session] where message ~ 'executing (Get|Scan)' or message ~ 'sampling';
                                     message
---------------------------------------------------------------------------------
  executing Scan [/Table/106/1,/Table/106/2), [txn: 0c8c2532], [can-forward-ts]
  sampling /Table/106/{1-2}
(2 rows)

Time: 1ms total (execution 1ms / network 0ms)

If the workload contains only scans for /Table/106/1 and /Table/106/2, we will suggest a spliit at /Table/106/2 in the weighted finder and never suggest a split in the unweighted finder (it would amplify QPS).

The problem is, R is empty by definition, since /Table/106/2 must sort after all keys in the table (only 1 index). Note that we use the union of spans the request iterated over (ca76c28) - so if the batch hits a limit and has a resumeSpan, we are unlikely to run into this problem

Passing percentiles and/or sampling more than the union of spans iterated over seems necessary.

In any case, this was desired behavior pre 23.1. There are load based rebalancing tests, such as allocbench which use some mix of spanning requests and split reasonably well. For a workload with only full table scans on a range bytes < resume threshold, we will never split - evidenced by the small number of splits in #103872 for splits/load/spanning/nodes=3/obj=cpu - which had the above pattern when investigating.

kvoli added a commit to kvoli/cockroach that referenced this pull request May 30, 2023
The load based splitting behavior was changed in cockroachdb#103690 to always find
a real key to split at, rather than use sampled response spans as split
points. This behavior change affected the number of splits in
`split/load` roachtests.

Update the min and max boundaries for various `split/load` roachtests to
conform to the new expected behavior.

Fixes: cockroachdb#103691
Fixes: cockroachdb#103935
Resolves: cockroachdb#103872

Release note: None
craig bot pushed a commit that referenced this pull request May 30, 2023
103985: roachtest: update activerecord adapter version r=rafiss a=andyyang890

This patch updates the version of the CockroachDB activerecord adapter
used for testing to v7.0.2.

Informs #97283
Informs #99620

Release note: None

104084: roachtest: bump split boundaries to prevent flakes r=tbg a=kvoli

The load based splitting behavior was changed in #103690 to always find a real key to split at, rather than use sampled response spans as split points. This behavior change affected the number of splits in `split/load` roachtests.

Update the min and max boundaries for various `split/load` roachtests to conform to the new expected behavior.

See #103872 for updated split distribution over 190 test runs.

Fixes: #103691
Fixes: #103935
Resolves: #103872

Release note: None

Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request May 30, 2023
The load based splitting behavior was changed in #103690 to always find
a real key to split at, rather than use sampled response spans as split
points. This behavior change affected the number of splits in
`split/load` roachtests.

Update the min and max boundaries for various `split/load` roachtests to
conform to the new expected behavior.

Fixes: #103691
Fixes: #103935
Resolves: #103872

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
6 participants