-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: support range keys in delete-only compactions #1656
Conversation
Best to review this one in commit order, as the second commit makes use changes made in the first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)
table_stats.go, line 596 at r2 (raw file):
} if iter != nil { // FIXME(travers): Do we also need the truncation here too?
Note to self: truncation isn't required here, as range keys will not be split across tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing comments on the first commit
Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions (waiting on @itsbilal and @nicktrav)
-- commits, line 17 at r1:
nit: s/a tables/tables/
table_stats.go, line 535 at r1 (raw file):
// tableRangedDeletionIter is a keyspan.FragmentIterator that returns "ranged // deletion" spans for a single table, providing a combined view of both range // deletion and range key deletion spans. The tableRAngedDeletionIter is
nit: tableRangedDeletionIter
table_stats.go, line 642 at r1 (raw file):
// largest and smallest keys in the defragmented span. This maintains the // contract that the emitted slice is sorted by (SeqNum, Kind) descending. reducer := func(current, incoming []keyspan.Key) []keyspan.Key {
I think there's a complication here where the reducer removes knowledge that the equality function needs. I'm imagining a scenario like:
a-b:{(#10,RANGEDEL) (#5,RANGEKEYDEL) (#3,RANGEDEL)}
c-d:{(#10,RANGEDEL) (#5,RANGEKEYDEL) (#3,RANGEDEL)}
d-e:{(#10,RANGEDEL) (#3,RANGEDEL)}
during forward iteration, if I understand it correctly, this iterator would yield
a-e:{(#10,RANGEDEL) (#3,RANGEDEL)}
table_stats.go, line 674 at r1 (raw file):
delIter := &tableRangedDeletionIter{ FragmentIterator: dIter, closers: closers,
is the closers slice necessary? wondering if we can let DefragmentIter.Close
call MergingIter.Close
which should fan out and close all the leaf iterators.
table_stats_test.go, line 168 at r1 (raw file):
err = w.RangeKeyUnset(start, end, []byte(parts[3])) case "range-key-del": err = w.RangeKeyDelete(start, end)
how about taking the keyspan.ParseSpan
format in this test, which allows customizing the sequence numbers? We'd have have to use the (*sstable.Writer).AddRangeKey
method. We care quite a bit about this iterator surfacing the correct sequence numbers, since we might delete files with live keys if it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)
Previously, jbowens (Jackson Owens) wrote…
nit: s/a tables/tables/
Done.
table_stats.go, line 535 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: tableRangedDeletionIter
Done.
table_stats.go, line 642 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think there's a complication here where the reducer removes knowledge that the equality function needs. I'm imagining a scenario like:
a-b:{(#10,RANGEDEL) (#5,RANGEKEYDEL) (#3,RANGEDEL)} c-d:{(#10,RANGEDEL) (#5,RANGEKEYDEL) (#3,RANGEDEL)} d-e:{(#10,RANGEDEL) (#3,RANGEDEL)}
during forward iteration, if I understand it correctly, this iterator would yield
a-e:{(#10,RANGEDEL) (#3,RANGEDEL)}
Good catch. I pushed a separate commit (which I will squash into the first before merge, provided we're happy with it).
We're now preserving just the first and last key for each emitted span, and modifying the key kinds to act as a "marker" in the case where there are mixed key kinds (i.e. range dels and range key dels). I believe this is fine as the key kinds are only used to infer the type of span and thus hint type (see the second commit).
Open to alternative approaches on encoding this range del only / range key only / mixed key span information.
table_stats.go, line 674 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
is the closers slice necessary? wondering if we can let
DefragmentIter.Close
callMergingIter.Close
which should fan out and close all the leaf iterators.
The problem I was running up against was that keyspan.Truncate
returns a new *keyspan.Iter
which has a no-op Close
method. If we just close the merging iter, the original range del iter isn't closed, and we panic due to the leak.
We seem to do a similar thing here (adding to a closers
slice before we lose the reference to Truncate
). It seems in this case we don't want to propagate the close, as we want to reuse the underlying iter.
table_stats_test.go, line 168 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
how about taking the
keyspan.ParseSpan
format in this test, which allows customizing the sequence numbers? We'd have have to use the(*sstable.Writer).AddRangeKey
method. We care quite a bit about this iterator surfacing the correct sequence numbers, since we might delete files with live keys if it doesn't.
Good call. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2, 1 of 4 files at r3, all commit messages.
Reviewable status: 2 of 7 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @nicktrav)
table_stats.go, line 642 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Good catch. I pushed a separate commit (which I will squash into the first before merge, provided we're happy with it).
We're now preserving just the first and last key for each emitted span, and modifying the key kinds to act as a "marker" in the case where there are mixed key kinds (i.e. range dels and range key dels). I believe this is fine as the key kinds are only used to infer the type of span and thus hint type (see the second commit).
Open to alternative approaches on encoding this range del only / range key only / mixed key span information.
Yeah, I think this mutation of the key kinds is fine for correctness, but I don't love the subtlety.
Throwing out another idea:
- Wrap the range deletion and range key deletion iterators separately in a
DefragmentingIter
, usingreduce
to only keep the first and last Keys of each. The range keyDefragmentIter
would need to ignore and dropRangeKeySets
,RangeKeyUnsets
. I think these could still share code if configured with a key kind to 'keep'. - Wrap the resulting two iterators in a merging iterator.
The result would be just as defragmented, because the only fragmentation points are at defragmented start/end bounds of each of the kind kind. The resulting span would have at most 4 keys, and we can scan the keys list to check for the existence of multiple kinds.
table_stats.go, line 674 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
The problem I was running up against was that
keyspan.Truncate
returns a new*keyspan.Iter
which has a no-opClose
method. If we just close the merging iter, the original range del iter isn't closed, and we panic due to the leak.We seem to do a similar thing here (adding to a
closers
slice before we lose the reference toTruncate
). It seems in this case we don't want to propagate the close, as we want to reuse the underlying iter.
Ah, yeah, that makes sense. Looking forward to when we can drop that keyspan.Truncate
.
table_stats.go, line 335 at r3 (raw file):
} estimate, hintSeqNum, err := d.estimateSizeBeneath(v, level, meta, start, end)
there's some awkwardness with estimateSizeBeneath
and the EstimateDiskUsage
call above. These functions estimate without knowledge of range keys. They include a file's entire size if the file's bounds fall within range of [start, end)
. Otherwise they use the point index blocks to calculate an estimate at block-level granularity.
I don't think we can ignore a little loss of precision from excluding range keys blocks when calculating estimates, since range keys are expected to be rare, but we should avoid including a large swath of the point keys if the span only contains RANGEKEYDEL
s
testdata/compaction_delete_only_hints, line 307 at r3 (raw file):
# +---+ +---+ +---+ # __________________________________________________________ # a b c d e f g h i j k l m n o p q r s t u v w x y z
Love this test.
1228395
to
061b700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs so far, @jbowens. Thoughts on the approach second commit? Regardless, I'm going to hold off on this until we have a path forward around #1663.
Reviewable status: 0 of 7 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)
table_stats.go, line 642 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Yeah, I think this mutation of the key kinds is fine for correctness, but I don't love the subtlety.
Throwing out another idea:
- Wrap the range deletion and range key deletion iterators separately in a
DefragmentingIter
, usingreduce
to only keep the first and last Keys of each. The range keyDefragmentIter
would need to ignore and dropRangeKeySets
,RangeKeyUnsets
. I think these could still share code if configured with a key kind to 'keep'.- Wrap the resulting two iterators in a merging iterator.
The result would be just as defragmented, because the only fragmentation points are at defragmented start/end bounds of each of the kind kind. The resulting span would have at most 4 keys, and we can scan the keys list to check for the existence of multiple kinds.
Latest commit implements this (which I shall also squash), building on the new keyspan.Filter
. Thanks for the suggestion!
There's one wrinkle that came up around key stability with the merging iter that's causing some metamorphic test failures. I opened #1663, and will wait for the fix for that to go in. This wasn't an issue previously as the merging iter was at the bottom of the stack, and its inputs were stable.
table_stats.go, line 335 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
there's some awkwardness with
estimateSizeBeneath
and theEstimateDiskUsage
call above. These functions estimate without knowledge of range keys. They include a file's entire size if the file's bounds fall within range of[start, end)
. Otherwise they use the point index blocks to calculate an estimate at block-level granularity.I don't think we can ignore a little loss of precision from excluding range keys blocks when calculating estimates, since range keys are expected to be rare, but we should avoid including a large swath of the point keys if the span only contains
RANGEKEYDEL
s
Good point. We have a TODO in the laundry list to update the size estimates. I'll look at that next. Depending on how it shakes out, I might put up a separate patch.
Add a new `keyspan.FragmentIterator` implementation that iterates over the range deletion and range key blocks of a single table. The iterator returns merged spans from the table corresponding to "ranged deletions". A "ranged deletion" span contains keys that fall into one of three cases: the span contains only range deletions; the span contains only range key deletions; the span contains a mixture of both range deletions and range key deletions. The new `tableRangedDeletionIter` will be used to construct spans necessary for calculating deletion hints on a table that takes range keys into account. Previously, only range deletion spans were considered. For a tables that do not contain range key deletes, the iterator returns the same set of spans that the existing logic would have returned.
Currently, when constructing delete-only compactions hints, only range delete spans are considered. With the introduction of range keys, the current hint construction logic is incorrect. Specifically, a hint constructed from a range delete may completely cover a table containing range keys. Currently, this table would be marked as eligible for deletion. However, the table should only be deleted if the range keys are also deleted. Make use of the `tableRangedDeletionIter` for construction of delete-only compaction hints. A new `deleteCompactionHintType` type is used to distinguish between hints generated by a range delete (which may not delete tables containing range keys), hints generated from a range key delete (which may not delete tables containing point keys), and hints generated from a span with both a range delete and range key delete (which may delete any covering table).
Fix the `tableRangedDeletionIter` implementation to preserve the spans "type" (i.e. range del keys only, range key del keys only, or mixed keys). The current implementation had a bug where a span with mixed key kinds could have its type switched when merged with an abutting span.
Defragment the range del and range key block separately. The `keyspan.Filter` is used to elide non-range key dels from the range key block. The two separate, defragmented spans are then merged wit the merging iter.
061b700
to
a20684e
Compare
Pushed a rebased version that picks up b9e970a. Still need to incorporate a tweak to the disk estimation to ensure we're not counting point key size under range key del deletion hints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r5, 2 of 4 files at r6, 2 of 2 files at r7, all commit messages.
Reviewable status: 6 of 7 files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
compaction.go
line 1683 at r7 (raw file):
// deleteCompactionHintType indicates whether the deleteCompactionHint was // generated from a span containing a range del (point key only), a range delete
nit: "a range key delete (range key only)"
We use the language "range delete" to refer to RANGEDEL
s in many places. I wish we had unambiguous names to differentiate these, but hard problems in computer science, amirite?
compaction.go
line 1717 at r7 (raw file):
switch k.Kind() { case base.InternalKeyKindRangeDelete: hintType |= 1 << 0
could this be hintType |= deleteCompactionHintTypePointKeyOnly
and down below hintType |= deleteCompactionHintTypeRangeKeyOnly
to make the relationship with the enum a little clearer?
compaction.go
line 1721 at r7 (raw file):
hintType |= 1 << 1 default: panic(fmt.Sprintf("unsupported hint hind: %s", k.Kind()))
nit: s/hint hind/key kind/
table_stats.go
line 583 at r4 (raw file):
// Truncate tombstones to the containing file's bounds if necessary. // See docs/range_deletions.md for why this is necessary. closers = append(closers, iter)
With your change to Truncate
, I think we could drop the closers
slice now.
internal/keyspan/span.go
line 73 at r7 (raw file):
func (k Key) Equal(b Key) bool { return k.Trailer == b.Trailer && bytes.Equal(k.Suffix, b.Suffix) &&
bytes.Equal
is okay for comparing values, but suffixes need to be compared with a user-provided base.Compare
or base.Equal
. At the moment, I believe bytes.Equal
works for CockroachDB's use, but with cockroachdb/cockroach#77342 equal suffixes are not necessarily byte equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 7 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go
line 1683 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: "a range key delete (range key only)"
We use the language "range delete" to refer to
RANGEDEL
s in many places. I wish we had unambiguous names to differentiate these, but hard problems in computer science, amirite?
Yeah - bit of a mouthful. Fixed!
compaction.go
line 1717 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
could this be
hintType |= deleteCompactionHintTypePointKeyOnly
and down belowhintType |= deleteCompactionHintTypeRangeKeyOnly
to make the relationship with the enum a little clearer?
Good idea. Done.
table_stats.go
line 583 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
With your change to
Truncate
, I think we could drop theclosers
slice now.
Yeah - that disappeared in a20684e. Reviewable is probably not doing it justice.
internal/keyspan/span.go
line 73 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
bytes.Equal
is okay for comparing values, but suffixes need to be compared with a user-providedbase.Compare
orbase.Equal
. At the moment, I believebytes.Equal
works for CockroachDB's use, but with cockroachdb/cockroach#77342 equal suffixes are not necessarily byte equal.
Ah, thanks. Looks like we have a base.Compare
wired up, so I'm plumbing that through now for the suffix comparisons.
In converting an existing function to a loop, a loop iteration could early `return`, which would skip subsequent hints. Use `continue` instead, to consider the remaining hints. Add a regression test. This commit will be squashed into the first commit.
Alter the existing test case to show that range del hints only take point key space into account, and range key del hints do not take point key space into account. This will be squashed into the second commit.
I'm going to close this out in favor of a new PR with all the squashing done and a rebase to pick up the latest changes from master. With all the changes that have happened in the last month (e.g. Happy to re-open this one if that ends up being simpler. |
Two commits in this patch.
TODOs:
db: add tableRangedDeletionIter for iteration of ranged deletion spans
Add a new
keyspan.FragmentIterator
implementation that iterates overthe range deletion and range key blocks of a single table. The iterator
returns merged spans from the table corresponding to "ranged deletions".
A "ranged deletion" span contains keys that fall into one of three
cases: the span contains only range deletions; the span contains only
range key deletions; the span contains a mixture of both range deletions
and range key deletions.
The new
tableRangedDeletionIter
will be used to construct spansnecessary for calculating deletion hints on a table that takes range
keys into account. Previously, only range deletion spans were
considered.
For a tables that do not contain range key deletes, the iterator returns
the same set of spans that the existing logic would have returned.
db: consider range keys in delete-only compactions
Currently, when constructing delete-only compactions hints, only range
delete spans are considered. With the introduction of range keys, the
current hint construction logic is incorrect. Specifically, a hint
constructed from a range delete may completely cover a table containing
range keys. Currently, this table would be marked as eligible for
deletion. However, the table should only be deleted if the range keys
are also deleted.
Make use of the
tableRangedDeletionIter
for construction ofdelete-only compaction hints. A new
deleteCompactionHintType
type isused to distinguish between hints generated by a range delete (which may
not delete tables containing range keys), hints generated from a range
key delete (which may not delete tables containing point keys), and
hints generated from a span with both a range delete and range key
delete (which may delete any covering table).