Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/bulk: chunk SSTs to row boundaries #79020

Merged
merged 1 commit into from
Mar 30, 2022
Merged

Conversation

dt
Copy link
Member

@dt dt commented Mar 30, 2022

Previously an sstable might end due to size at /table/i/rowX/col/Y, if
some, but not all, families for rowX fit in that file. This is OK as far
as KV and SQL are concerned, since after we add the next file which will
start with rowX/colZ, the row is complete from the point of view of any
scan. However it does mean that if, after adding this file we determine
that we need to split before adding the next file, that split, as it
must be at a row boundary, will be at rowX, not rowX/colZ. This too is
OK, but has the slight downside of meaning that when we scatter the new
RHS, starting at rowX, we have to move the colY family KV we just added
in the prior prior file. While it is typically a trivial amount of data,
it does make the RHS non-empty and thus require some cost to move.

This changes the size-based limit that triggers a file flush to wait for
the next row boundary after the size is exceeded, so that SST bounds now
also fall on row, and thus any future range split, bounds.

This is particularly relevant in conjunction with #78218.

Release note: none.

@dt dt requested review from nvanbenschoten and adityamaru March 30, 2022 04:08
@dt dt requested a review from a team as a code owner March 30, 2022 04:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Perhaps an interesting project at some point would be something to make writing unit tests for the nuances in the splitting and chunking logic a bit easier, if for no other reason than to document the current behaviour we expect.

pkg/kv/bulk/sst_batcher.go Show resolved Hide resolved
Previously an sstable might end due to size at /table/i/rowX/col/Y, if
some, but not all, families for rowX fit in that file. This is OK as far
as KV and SQL are concerned, since after we add the next file which will
start with rowX/colZ, the row is complete from the point of view of any
scan. However it does mean that if, after adding this file we determine
that we need to split before adding the next file, that split, as it
must be at a row boundary, will be at rowX, not rowX/colZ. This too is
OK, but has the slight downside of meaning that when we scatter the new
RHS, starting at rowX, we have to move the colY family KV we just added
in the prior prior file. While it is typically a trivial amount of data,
it does make the RHS non-empty and thus require _some_ cost to move.

This changes the size-based limit that triggers a file flush to wait for
the next row boundary after the size is exceeded, so that SST bounds now
also fall on row, and thus any future range split, bounds.

This is particularly relevant in conjunction with cockroachdb#78218.

Release note: none.
@dt
Copy link
Member Author

dt commented Mar 30, 2022

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 30, 2022

Build succeeded:

@craig craig bot merged commit 4bce84e into cockroachdb:master Mar 30, 2022
@dt dt deleted the split-row branch March 30, 2022 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants