kv/bulk: chunk SSTs to row boundaries #79020
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously an sstable might end due to size at /table/i/rowX/col/Y, if
some, but not all, families for rowX fit in that file. This is OK as far
as KV and SQL are concerned, since after we add the next file which will
start with rowX/colZ, the row is complete from the point of view of any
scan. However it does mean that if, after adding this file we determine
that we need to split before adding the next file, that split, as it
must be at a row boundary, will be at rowX, not rowX/colZ. This too is
OK, but has the slight downside of meaning that when we scatter the new
RHS, starting at rowX, we have to move the colY family KV we just added
in the prior prior file. While it is typically a trivial amount of data,
it does make the RHS non-empty and thus require some cost to move.
This changes the size-based limit that triggers a file flush to wait for
the next row boundary after the size is exceeded, so that SST bounds now
also fall on row, and thus any future range split, bounds.
This is particularly relevant in conjunction with #78218.
Release note: none.