-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Incrementally calculate range key stats in CheckSSTConflicts #85673
Conversation
More unfortunate code blowup here, but something that we can probably reduce over time. |
dfcbece
to
d10c6e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks like a great start!
The stats calculations here should use the updateStatsOnRangeKey*
helpers, and add any new helpers if anything is missing.
It doesn't seem like we're doing any timestamp handling here, so GCBytesAge
calculations will be wrong. Is the plan to add this afterwards? The updateStats
helpers will account for this correctly, as long as the results are added to the stats via MVCCStats.Add()
.
Also, it doesn't seem like we're handling range key merges? The typical example would be when an ingestion job find a Raft range bound and splits a range key across two SSTs (one for each Raft range). But then the Raft ranges merge before the SSTs arrive, so the two abutting SSTs are now ingested next to each other in a single Raft range, and the range key will merge. This will also happen for any range key versions below them.
I also wonder if it'd be worth adding TestMVCCHistories
support for CheckSSTConflicts
and SST ingestion? We already have support for building SSTs and building Engine data there, so it should be straightforward to add a command that applies CheckSSTConflicts
to a built SST and ingests it into the engine. Those tests can automatically track MVCC stats changes and compare them to the real computation. I find those tests are very efficient at getting lots of test coverage with very little effort. I'm fine with punting this to stability though.
Reviewed 3 of 6 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go
line 209 at r1 (raw file):
} desc := cArgs.EvalCtx.Desc() _, rightPeekBound := rangeTombstonePeekBounds(
We need to plumb through leftPeekBound
too. The MVCC stats calculations need this to know how to handle range keys at the Raft range bounds. For example, let's say the SST has [c-d)@2
, but in Pebble there already exists a range key [a-f)@1
, but this is split into two Raft ranges [a-c)
and [c-f)
. If we don't plumb through a leftPeekBound
of c
in this case, the stats will incorrectly consider the c
bound of the new range key as fragmenting the range key below it -- that fragmentation has already been accounted for by the Raft range split. We need to provide the correct bounds to the iterator so that it doesn't look outside of the Raft range.
pkg/storage/bench_test.go
line 1616 at r1 (raw file):
b.ResetTimer() for i := 0; i < b.N; i++ { _, err := CheckSSTConflicts(context.Background(), sstFile.Data(), eng, sstStart, sstEnd, sstEnd.Key.Next(), false, hlc.Timestamp{}, math.MaxInt64, usePrefixSeek)
I'd just have this take nil
when we don't care about the bounds (i.e in tests). That's what we usually do elsewhere.
pkg/storage/sst.go
line 350 at r1 (raw file):
extTombstones := extRangeKeys.Versions.Clone() extTombstones.Trim(sstTombstone.Timestamp, hlc.MaxTimestamp) isIdempotent := len(extTombstones) == len(sstRangeKeys.Versions)
Huh, I though we already had Equal()
for MVCCRangeKeyStack
and MVCCRangeKeyVersions
. Mind adding them?
d10c6e9
to
0e0e646
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all those points! Updated with the helpers and with the merging cases (which were quite a pandora's box).
I was deferring the GCBytesAge part until later, just to make this more shippable. I have some in-progress stuff to add GCBytesAge testing to all of this, that'll make it clear if there are any remaining gaps and where those are.
Also agreed about TestMVCCHistories stuff being very useful, however I'd rather defer that too to keep this scope-limited enough to ship.
Unfortunately even with the limited scope this PR has blown up a lot, and I'm really not a fan of how this function is looking now. There are lots of intricate conditionals, all to satisfy some edge case somewhere that doesn't conform to the otherwise "just check the two range key stacks, one is usually to the right of the other" simplification that this function originally satisfied and largely still does for conflict checking. The good news is that the test cases do give us quite a bit of confidence in this working (once I fix the last remaining one about idempotent cases imminently, though I'm also inclined to just flip ContainsEstimates=1
if we see idempotent range keys because they add a fair bit of complexity).
I'm happy to think of ways to refactor this function and to implement it in a much cleaner way. Unfortunately that won't be possible in time for stability. Thoughts and suggestions welcome on how we should time this all.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @itsbilal)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go
line 209 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
We need to plumb through
leftPeekBound
too. The MVCC stats calculations need this to know how to handle range keys at the Raft range bounds. For example, let's say the SST has[c-d)@2
, but in Pebble there already exists a range key[a-f)@1
, but this is split into two Raft ranges[a-c)
and[c-f)
. If we don't plumb through aleftPeekBound
ofc
in this case, the stats will incorrectly consider thec
bound of the new range key as fragmenting the range key below it -- that fragmentation has already been accounted for by the Raft range split. We need to provide the correct bounds to the iterator so that it doesn't look outside of the Raft range.
Good point. Done.
pkg/storage/sst.go
line 350 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Huh, I though we already had
Equal()
forMVCCRangeKeyStack
andMVCCRangeKeyVersions
. Mind adding them?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to think of ways to refactor this function and to implement it in a much cleaner way. Unfortunately that won't be possible in time for stability. Thoughts and suggestions welcome on how we should time this all.
Yeah, I see what you're saying. Thanks for the valiant effort here.
It might be possible to simplify this a lot by doing two passes: one to check for conflicts, and another range-key-only pass to adjust stats for splits/merges. But as you say, we're running out of time.
How do you feel about this? Are you fairly confident in this implementation, and happy to merge it as a temporary fix? Alternatively, since there aren't any current callers that will write range keys with CheckSSTConflicts, we could also punt this by erroring out on any range keys in the SSTs (but still take engine range keys into account for stats updates). But we'll likely have to add this back in at some point.
I'm also curious about the performance implications here. Are we taking a big hit in the typical no-range-key case?
I haven't reviewed the stats calculations in detail, since I assume the tests would do a much better job at that than me anyway.
Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/storage/sst.go
line 68 at r2 (raw file):
} if leftPeekBound == nil { leftPeekBound = start.Key
I don't believe this will handle a LHS merge/split by default? Maybe just keys.MinKey
and keys.MaxKey
by default?
pkg/storage/sst.go
line 442 at r2 (raw file):
case -1: // sstRangeKey starts earlier than extRangeKey. Add a fragment statsDiff.RangeKeyBytes += int64(EncodedMVCCKeyPrefixLength(extRangeKeys.Bounds.Key))
Updating the stats directly like this is a bit iffy because it doesn't take into account time differences for e.g. GCBytesAge
, like Add
does. That's fine for now, since we don't handle that here anyway.
a24df63
to
d7f523a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do feel confident enough to merge this to finish the functionality, but I would want to address it / improve testing and simplify the code within stability period. It's unfortunate that a relatively large work item is getting punted into stability, but I think merging this now does put us further ahead than not doing so.
Maybe the two passes will make it easier, yes - and maybe we can more easily do the left/right peeks for each range key we encounter in the sstables on the second pass (while keeping the first pass as-is for conflict checking). I'll think of that as a future work item.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/storage/sst.go
line 68 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I don't believe this will handle a LHS merge/split by default? Maybe just
keys.MinKey
andkeys.MaxKey
by default?
Done.
pkg/storage/sst.go
line 442 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Updating the stats directly like this is a bit iffy because it doesn't take into account time differences for e.g.
GCBytesAge
, likeAdd
does. That's fine for now, since we don't handle that here anyway.
Yeah, I imagine these bound adjustments are going to mess with GCBytesAge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd agree with that assessment. Let's just be mindful of any perf regressions here before we ship the release.
a8aaaf6
to
36f4bbe
Compare
This change updates CheckSSTConflicts to incrementally calculate stats in the presence of range keys in the SST being ingested. This avoids expensive stats recomputation after AddSSTable, as previously we were marking stats as estimates if an SST with range keys was added. Fixes cockroachdb#83405. Release note: None.
36f4bbe
to
219652e
Compare
TFTR! bors r=erikgrinaker |
Build failed (retrying...): |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/storage/sst.go
line 675 at r4 (raw file):
sstOK, sstErr = sstIter.Valid() } else { extPrevKey := extIter.Key()
is this shadowing of the outer scoped extPrevKey
intentional?
This change updates CheckSSTConflicts to incrementally
calculate stats in the presence of range keys in the
SST being ingested. This avoids expensive stats recomputation
after AddSSTable, as previously we were marking stats as
estimates if an SST with range keys was added.
Fixes #83405.
Release note: None.