Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: support range keys in delete-only compactions #1750

Merged
merged 2 commits into from
Jun 16, 2022

Conversation

nicktrav
Copy link
Contributor

@nicktrav nicktrav commented Jun 9, 2022

NOTE: This PR depends on #1759.

Two commits in this patch - the first defines a new composite iterator, the second puts it to use.

This is a re-do of #1656, rebased on latest master to pick up various interfaces changes, and also squash all of the commits from that PR. All feedback from the original PR has been addressed.

TODOs:


db: add tableRangedDeletionIter for iteration of ranged deletion spans

Add a new keyspan.FragmentIterator implementation that iterates over
the range deletion and range key blocks of a single table. The iterator
returns merged spans from the table corresponding to "ranged deletions".
A "ranged deletion" span contains keys that fall into one of three
cases:

  • the span contains only range deletions,
  • the span contains only range key deletions, or
  • the span contains a mixture of both range deletions and range key
    deletions.

Under the hood, the iterator first defragments the range deletion and
range key deletion blocks separately, using the table's respective
keyspan.FragmentIterators. This defragmentation process also filters
out non-deletion key types (i.e. range key SET and UNSETS). The two
separate iterators are then combined into a keyspan.MergingIter.

This new tableRangedDeletionIter will be used to construct spans
necessary for calculating deletion hints on a table that takes range
keys into account. Previously, only range deletion spans were
considered.

For table that do not contain range key deletes, the iterator returns
the same set of spans that the existing logic would have returned.

db: consider range keys in delete-only compactions

Currently, when constructing delete-only compactions hints, only range
delete spans are considered. With the introduction of range keys, the
current hint construction logic is incorrect. Specifically, a hint
constructed from a range delete may completely cover a table containing
range keys. Currently, this table would be marked as eligible for
deletion. However, the table should only be deleted if the range keys
are also deleted.

Make use of the tableRangedDeletionIter for construction of
delete-only compaction hints. A new deleteCompactionHintType type is
used to distinguish between hints generated by a range delete (which may
not delete tables containing range keys), hints generated from a range
key delete (which may not delete tables containing point keys), and
hints generated from a span with both a range delete and range key
delete (which may delete any covering table).

@nicktrav nicktrav requested a review from jbowens June 9, 2022 23:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

table_stats.go Outdated Show resolved Hide resolved
@nicktrav nicktrav force-pushed the nickt.deletion-only-v2 branch 3 times, most recently from d2f8d24 to 5cce5f3 Compare June 10, 2022 03:19
@nicktrav nicktrav requested a review from itsbilal June 10, 2022 04:17
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, thanks for doing this

Reviewed 1 of 6 files at r1, 8 of 8 files at r3, 4 of 5 files at r4, all commit messages.
Reviewable status: 8 of 9 files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)


table_stats.go line 313 at r3 (raw file):

	defer iter.Close()
	var compactionHints []deleteCompactionHint
	// We iterate over the defragmented range tombstones, which ensures

nit: s/defragmented range tombstones/defragmented range tombstones and range key deletions/


table_stats.go line 338 at r3 (raw file):

			return nil, err
		}
		stats.RangeDeletionsBytesEstimate += estimate

here, and above in the level == numLevels - 1 conditional, I think we only want to populate RangeDeletionsBytesEstimate if s contains any keys of kind RANGEDEL.

RangeDeletionsBytesEstimate is the estimate of bytes within [start, end) in point key blocks, or if tables are wholly contained, the entirety of the sstable. We want to avoid totaling this if there's only a RANGEKEYDEL over the span.

We should also leave a comment that if s only contains RANGEDEL and the span wholly contains a table containing range keys, this estimate will be off, mistakenly including the size of the range key block. That's expected to be okay, because range keys are expected to be rare and not contribute significantly to sstable size.


table_stats.go line 584 at r3 (raw file):

// corresponding to the largest and smallest sequence numbers encountered across
// the range deletes and range keys deletes that comprised the merged spans.
type tableRangedDeletionIter struct {

nit: do we need this type now? or could we return the merging iter as an opaque FragmentIterator?


table_stats.go line 589 at r3 (raw file):

// newTableRangeDeletionIter returns a new tableRangedDeletionIter.
func newTableRangeDeletionIter(

nit: can we call this something that makes it a little clear that it's range deletions and range key deletions? newTableRangedDeletionIter is better, maybe newCombinedDeletionKeyspanIter or something is even better? The term 'range deletion' is just so ubiquitous referring specifically to RANGEDEL that anything close to it is likely to be confused.


table_stats.go line 674 at r3 (raw file):

	// The separate iters for the range dels and range keys are wrapped in a
	// merging iter to join the keyspaces into a single keyspace.
	mIter := &keyspan.MergingIter{}

nit: now that we have MergingIter.AddLevel, we can avoid the allocations of appending to the []FragmentIterator slice by Init-ing the MergingIter higher up, and directly calling AddLevel when we find a non-nil rangedel or range key iterator.

Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)


table_stats.go line 313 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: s/defragmented range tombstones/defragmented range tombstones and range key deletions/

Done.


table_stats.go line 338 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

here, and above in the level == numLevels - 1 conditional, I think we only want to populate RangeDeletionsBytesEstimate if s contains any keys of kind RANGEDEL.

RangeDeletionsBytesEstimate is the estimate of bytes within [start, end) in point key blocks, or if tables are wholly contained, the entirety of the sstable. We want to avoid totaling this if there's only a RANGEKEYDEL over the span.

We should also leave a comment that if s only contains RANGEDEL and the span wholly contains a table containing range keys, this estimate will be off, mistakenly including the size of the range key block. That's expected to be okay, because range keys are expected to be rare and not contribute significantly to sstable size.

Turns out there was a bug here that wasn't calculating stats for tables with just range keys. Fixed and added an additional test case.


table_stats.go line 584 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: do we need this type now? or could we return the merging iter as an opaque FragmentIterator?

Good call. The new function now returns the interface.


table_stats.go line 589 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: can we call this something that makes it a little clear that it's range deletions and range key deletions? newTableRangedDeletionIter is better, maybe newCombinedDeletionKeyspanIter or something is even better? The term 'range deletion' is just so ubiquitous referring specifically to RANGEDEL that anything close to it is likely to be confused.

I think I'd just typo'd originally - intention was for newTableRangedDeletionIter. That highlights the problem, so let's go with newCombinedDeletionKeyspanIter.


table_stats.go line 674 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: now that we have MergingIter.AddLevel, we can avoid the allocations of appending to the []FragmentIterator slice by Init-ing the MergingIter higher up, and directly calling AddLevel when we find a non-nil rangedel or range key iterator.

Nice! Updated.

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: great work!

Reviewable status: 5 of 9 files reviewed, 4 unresolved discussions (waiting on @jbowens and @nicktrav)


table_stats.go line 313 at r3 (raw file):

	defer iter.Close()
	var compactionHints []deleteCompactionHint
	// We iterate over the defragmented range tombstones, which ensures

This would be range key deletions too, right? Not just range tombstones?


testdata/table_stats line 264 at r3 (raw file):


# Ingest a table with range keys.
# TODO(travers): Once range keys in batches are flushed to tables, this testcase

This should be the case now, right?


testdata/table_stats_deletion_iter line 57 at r3 (raw file):


build
a-z:{(#0,RANGEKEYDEL) (#0,RANGEKEYSET,@1,foo) (#0,RANGEKEYUNSET,@2)}

Maybe we should have a case where the RANGEKEYDEL has a lower sequence number than the others to confirm this is the Filter working and not the deletion shadowing the others?

@nicktrav nicktrav force-pushed the nickt.deletion-only-v2 branch from 4c656db to 45aeb2c Compare June 15, 2022 21:36
Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased on master to pick up the recent compaction changes, and pushed a few more commits addressing feedback. I'm also doing a burn of the metamorphic tests.

I'll wait for one more LGTM before squashing, landing, and putting this one to bed ... finally.

Reviewable status: 1 of 9 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)


table_stats.go line 313 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This would be range key deletions too, right? Not just range tombstones?

Updated.


testdata/table_stats line 264 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This should be the case now, right?

Yeah - though I have a bunch of other test cases like this (that use the ingestion hack) that I'd like to clean up. I might address these in a follow-up, to avoid the extra noise.


testdata/table_stats_deletion_iter line 57 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Maybe we should have a case where the RANGEKEYDEL has a lower sequence number than the others to confirm this is the Filter working and not the deletion shadowing the others?

Good call. Added.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Looks great!

Reviewed 1 of 7 files at r10, 2 of 5 files at r11, 1 of 2 files at r14, 2 of 2 files at r16, 1 of 1 files at r17, 1 of 1 files at r18, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @itsbilal and @nicktrav)


compaction.go line 1849 at r18 (raw file):

}

// compactionHintFromKeys returns a deleteCompactionHintType give a slice of

nit: s/give/given/


table_stats.go line 477 at r18 (raw file):

	// the RangeDeletionsBytesEstimate statistic and can't populate table stats
	// from just the properties. The table stats collector goroutine will populate
	// the stats.

nit: mind adjusting this comment to explicitly call out the need for the table stats collector to trigger delete-only compactions? It seems like there's two separate reasons we can't compute stats here:

  1. If a table has range deletions we can't calculate RangeDeletionsBytesEstimate without scanning the LSM, and we need the table stats collector to do that.
  2. Additionally, if a table has range deletions or range key deletions, we need the table stats collector to run on the file so that it can look for opportunities for delete-only compactions.

nicktrav added 2 commits June 16, 2022 14:31
Add a new `keyspan.FragmentIterator` implementation that iterates over
the range deletion and range key blocks of a single table. The iterator
returns merged spans from the table corresponding to "ranged deletions".
A "ranged deletion" span contains keys that fall into one of three
cases:

- the span contains only range deletions,
- the span contains only range key deletions, or
- the span contains a mixture of both range deletions and range key
  deletions.

Under the hood, the iterator first defragments the range deletion and
range key deletion blocks separately, using the table's respective
`keyspan.FragmentIterator`s. This defragmentation process also filters
out non-deletion key types (i.e. range key SET and UNSETS). The two
separate iterators are then combined into a `keyspan.MergingIter`.

This new `tableRangedDeletionIter` will be used to construct spans
necessary for calculating deletion hints on a table that takes range
keys into account. Previously, only range deletion spans were
considered.

For table that do not contain range key deletes, the iterator returns
the same set of spans that the existing logic would have returned.
Currently, when constructing delete-only compactions hints, only range
delete spans are considered. With the introduction of range keys, the
current hint construction logic is incorrect. Specifically, a hint
constructed from a range delete may completely cover a table containing
range keys. Currently, this table would be marked as eligible for
deletion. However, the table should only be deleted if the range keys
are also deleted.

Make use of the `tableRangedDeletionIter` for construction of
delete-only compaction hints. A new `deleteCompactionHintType` type is
used to distinguish between hints generated by a range delete (which may
not delete tables containing range keys), hints generated from a range
key delete (which may not delete tables containing point keys), and
hints generated from a span with both a range delete and range key
delete (which may delete any covering table).
@nicktrav nicktrav force-pushed the nickt.deletion-only-v2 branch from 45aeb2c to 80efae1 Compare June 16, 2022 21:33
Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squashed all the fixup commits. I'll address the test cases in a follow-up.

TFTRs!

Reviewable status: 4 of 9 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)


compaction.go line 1849 at r18 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: s/give/given/

Done.


table_stats.go line 477 at r18 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: mind adjusting this comment to explicitly call out the need for the table stats collector to trigger delete-only compactions? It seems like there's two separate reasons we can't compute stats here:

  1. If a table has range deletions we can't calculate RangeDeletionsBytesEstimate without scanning the LSM, and we need the table stats collector to do that.
  2. Additionally, if a table has range deletions or range key deletions, we need the table stats collector to run on the file so that it can look for opportunities for delete-only compactions.

Done!

@nicktrav nicktrav mentioned this pull request Jun 16, 2022
29 tasks
@nicktrav nicktrav merged commit 059c072 into cockroachdb:master Jun 16, 2022
@nicktrav nicktrav deleted the nickt.deletion-only-v2 branch June 16, 2022 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants