-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add RevertRange #38477
storage: add RevertRange #38477
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt, @nvanbenschoten, and @tbg)
pkg/roachpb/api.proto, line 290 at r1 (raw file):
RequestHeader header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true]; util.hlc.Timestamp start_time = 2 [(gogoproto.nullable) = false];
Give this a comment. Make sure to be explicit about whether this is an inclusive or exclusive lower bound. IIRC we're not very consistent about that when it comes to time boundaries looking backwards.
pkg/storage/batcheval/cmd_clear_range.go, line 103 at r1 (raw file):
return result.Result{}, err } *cArgs.Stats = stats
cArgs.Stats
is a delta, not an absolute value. See cmd_recompute_stats.go
.
pkg/storage/batcheval/cmd_clear_range_test.go, line 170 at r1 (raw file):
t.Fatal(err) } return h.Sum([]byte{})
nit: h.Sum(nil)
?
pkg/storage/batcheval/cmd_clear_range_test.go, line 192 at r1 (raw file):
before := hlc.Timestamp{WallTime: 1000} // lay down some keys to be the starting point to which we'll return later.
nit: these comments aren't in accordance with our style guide because they're not sentence cased.
pkg/storage/batcheval/cmd_clear_range_test.go, line 246 at r1 (raw file):
} t.Run("revert to before", func(t *testing.T) {
revert to b?
pkg/storage/batcheval/cmd_clear_range_test.go, line 246 at r1 (raw file):
} t.Run("revert to before", func(t *testing.T) {
I think it would be easy to pull each of these subtests into a helper/closure with just a few params to reduce duplication.
pkg/storage/batcheval/cmd_clear_range_test.go, line 262 at r1 (raw file):
StartTime: cutoff, } cArgs.Stats = &enginepb.MVCCStats{}
Given that I think we're getting the stats wrong right now, I think it would be worth testing them in each of these subtests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not reviewing this yet, but now that @nvanbenschoten is giving you things to do I should write my concerns down:
- this potentially creates very large batches, so it's an operation like
DeleteRange
that needs to be paginated explicitly by the caller (esp. since we want to bump the range size default up over time), or some magical resume key magic needs to happen. ClearRange doesn't have this problem because it just puts down a range tombstone. - the above gives some evidence that this should be its own RPC instead
- How does this interact with GCThreshold? There isn't an issue for ClearRange (or maybe it is) because that keyspace was never supposed to be touched again. But maybe it's not an issue here? I can't tell because there isn't enough detail on how this will be used. Clients won't touch this keyspace until the bulk ingestion is done, right? At which point it has better data at a newer timestamp so things should work out.
Also, this is a one-off question but since we're both here -- when I restore a backup, do I also restore a GCThreshold? That is necessary for correctness (or folks can run broken reads in the far past that was gc'ed before backup was taken).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @tbg)
Based on my current expected client call, I'd rather not push pagination all the way back to the client (or maybe I need to rethink my usage). My expected call is basically to make a single time-bound
I'm not sure I see the interaction with Clear/ClearRange can and do rewrite history. In my current usage, I'll be doing this at a timestamp such that I assume SQL is has been locked out of reading, so I won't be changing the results of any served SQL reads, but I'm not seeing the direct interaction with GC yet. Where I think there is interaction with GC is in AddSSTable: if I add a key that shadows an existing one which is older than GC TTL, it could be GC'ed at any time. If we then used this RPC to try to clear the SST's shadowing key, there would be nothing remaining under it. That however is a problem with AddSSTable, not this RPC -- Ben pointed out that the new keys should carry a flag saying they are provisional / subject to rollback that suspends GC until some followup RPC or expiration, and that would then update some range state suspending GC. But that's all for AddSSTable to handle -- I don't see the interaction with GC in this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the range would handle breaking that up as needed to execute it, so maybe a wrapper RPC (similar to how running an ImportRequest in turn runs some number of (usually local) AddSSTableRequests) that does the pagination locally on each range before returning?
This is a reasonable expectation. We can't do this in a general fashion for nontransactional requests (the contract is that a command is either all-or-nothing) so some wrapper request would have to be created as you suggested. But at the same time you'll need to introduce pagination into ClearRange
(so that the wrapper can paginate) which doesn't make sense for the original ClearRange, so it still seems better to make a new RPC that's closer to DeleteRange in that it allows paginating via the StartKey
, ResumeSpan
, and maximum number of affected entries (or rather a size quota, which makes more sense here). BTW the fanout in DistSender has been causing problems for very expensive requests so it's not appropriate for something like this. The wrapper RPC solves this too because the pagination introduces throttling.
Finding good names will be awkward, the wrapper can't be named the same as the base RPC (RevertRange)?
I'm not sure I see the interaction with GCThreshold. GCThreshold is still maintained by GC as before.
The GCThreshold prevents reads of keyspace that has been altered in a way that doesn't preserve MVCC semantics. If you had previously served reads on versions that you are reverting, that would create problems that would be solved by bumping the GCThreshold in the process. But I think the contract that you want doesn't need this? I.e. you know that nobody has ever seen any of the data you are reverting, that is, nobody has ever accessed that table with a timestamp >= start_timestamp. Is that all correct? (Please also add that into an appropriate comment somewhere).
Also, this is a one-off question but since we're both here -- when I restore a backup, do I also restore a GCThreshold?
Not sure if you saw this, I'm still curious. If we take a backup, we don't allow historical reads before the GC threshold at which the backup was taken, right? Sorry for the off-topic, just making sure.
I can't tell because there isn't enough detail on how this will be used.
Mind giving this PR (or better yet, also a comment on the analogue of ImportRequest)? I suspect you're looking into importing into existing tables in the following manner:
- lock the table to clients, procure a safe timestamp T which we know hasn't been observed by anyone
- import into the table
- on failure, use this PR to clean up
- on success, unlock the table again
This gives a lot of context about what the command should and should not do, so I'd appreciate it being explicitly verbalized.
@nvanbenschoten you reviewed this PR as if you didn't have any broader concerns, feel free to chime in here so @dt doesn't get wagged around between the two of us
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt)
pkg/storage/batcheval/cmd_clear_range.go, line 87 at r1 (raw file):
from, to, func(kv engine.MVCCKeyValue) (bool, error) { if !kv.Key.Timestamp.Less(args.StartTime) {
Is this going to handle intents correctly? I suspect not (also because it doesn't seem to be tested). I also think that you cannot support intents. This is because if you see an intent at timestamp 10 and say you're reverting everything >= 2, you don't know whether you have to remove the intent. This is because the intent could be committed at timestamp >= 12 even though at the moment it sits at 10. So maybe what you need to do here is to do the same thing that BACKUP does, which is bail when there's an intent (WriteIntentError
) similar to how a read would.
I'd rather see a specialized method for this that lives in engine
(
MVCCRevertRange?) and gets a solid amount of unit testing. Your TODO below would also be addressed there more naturally, see the various
updateStatsOnX` methods.
another thing that occurred to me on the train this morning: we might want a special case where if every key the iterator encounters meets the predicate, instead of sending the batch of all the individual clears to rocks we instead just send a regular range deletion over that key span. |
@tbg RESTORE does not change the gcthreshold on any ranges or mess with any range metadata: it just runs Currently the SSTs constructed by We've thrown around moving all the timestamps of the keys in the backup to the ingestion time since we rewrite all the keys anyway, so that then historical reads from some time X before the RESTORE would still return empty as they did before and the whole table would appear to spring into life all at once at the restore time... but so far we've decided that we couldn't find a compelling reason to do that given the assumptions about SQL (not) reading those keys, and leaving the original timestamps there could be nice if SQL ever surfaces some form of modified-time backed by mvcc. |
I like where we're going with this One option for ensuring that this works correctly with the GC is to introduce a new per-Range (or not) "rollback window" that parallels the current Imports into existing tables would set this window to an arbitrarily high value to effectively disable GC of its keyspace. Users of CRDB could also set this window to some reasonable value (1 hour) to ensure that they have the option to rollback any updates over some time period without risk of data loss. |
This sounds like a good core feature, but I think the scope is growing a bit beyond what Bulk-IO would usually send your way. Is there a smaller-scope version we could start with to make progress incremental IMPORT and document a known-limitation or other workaround here for now, and come back to make this fancier RevertRange RPC later? |
I think the next simplest version of this incremental IMPORT / RevertRange interaction is one that disables the GC on the entire table before beginning the IMPORT and reverts the GC change when the IMPORT is complete. Past that point, anything simpler will risk data loss in the case where an IMPORT is rolled back. @tbg do you see any other intermediate options here? |
v1 of |
Then it does seem reasonable to introduce PS I had to look around and eventually find #26834. Please mention in the commit message what the "v1" does and how this PR fits in. It took me some time to reverse engineer all of that. Also, for the v1 (in which incoming data never touches existing rows), how do you verify that it doesn't? You are planning to just throw SSTs into KV, right? Are you scanning the keys touched by the SST during evaluation to make sure nothing overlaps? |
^- I just found #38449, which answers my last question. |
Uh, what GitHub? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/roachpb/api.proto, line 290 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this a comment. Make sure to be explicit about whether this is an inclusive or exclusive lower bound. IIRC we're not very consistent about that when it comes to time boundaries looking backwards.
Done.
pkg/storage/batcheval/cmd_clear_range.go, line 87 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is this going to handle intents correctly? I suspect not (also because it doesn't seem to be tested). I also think that you cannot support intents. This is because if you see an intent at timestamp 10 and say you're reverting everything >= 2, you don't know whether you have to remove the intent. This is because the intent could be committed at timestamp >= 12 even though at the moment it sits at 10. So maybe what you need to do here is to do the same thing that BACKUP does, which is bail when there's an intent (
WriteIntentError
) similar to how a read would.I'd rather see a specialized method for this that lives in
engine
(
MVCCRevertRange?) and gets a solid amount of unit testing. Your TODO below would also be addressed there more naturally, see the various
updateStatsOnX` methods.
good catch -- added an error on intents.
I moved most of the logic to engine
(though doing the stats part is beyond what I think I can take on so that is still done in the batch eval).
pkg/storage/batcheval/cmd_clear_range.go, line 103 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
cArgs.Stats
is a delta, not an absolute value. Seecmd_recompute_stats.go
.
Done.
pkg/storage/batcheval/cmd_clear_range_test.go, line 170 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit:
h.Sum(nil)
?
Done.
pkg/storage/batcheval/cmd_clear_range_test.go, line 246 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
revert to b?
Done.
pkg/storage/batcheval/cmd_clear_range_test.go, line 262 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Given that I think we're getting the stats wrong right now, I think it would be worth testing them in each of these subtests.
Done.
1721a14
to
a82bed3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it does seem reasonable to introduce RevertRange without any special provisions for GC. I'd prefer it that way too, introducing RevertRange alone is going to be a nice chunk of focused work at the MVCC layer, so if we can do that first and worry about the GC later, that's good.
I agree with this. It seems like we can introduce RevertRange now and polish off the request's interface even if the implementation still has a few gaps.
It sounds like the question of whether or not we paginate these requests is still an issue as well, right @dt? If we were to introduce a new request type (i.e. RevertRange
) then we could hook it up to the resume span infrastructure from the beginning, which would allow for pagination to be handled in a fairly straightforward manner on the client (see deleteRangeNode
for an example). This would prevent the request from being parallelized by DistSender, but it wouldn't prevent DistSender from fanning out like you're hoping. Is this insufficient?
Reviewed 4 of 7 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt and @tbg)
pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):
batch engine.ReadWriter, from, to engine.MVCCKey, since hlc.Timestamp, ) (bool, error) { // Use a TBI to check if there is anyting to delete.
I'd point out that the MinTimestampHint
is just a hint, so not all keys that we see here are actually >= since, however, we know that if we see any keys, that at least some are >= since, otherwise we would have ignored those entire SSTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'd point out that the
MinTimestampHint
is just a hint, so not all keys that we see here are actually >= since, however, we know that if we see any keys, that at least some are >= since, otherwise we would have ignored those entire SSTs.
Right -- I'm not using the timestamp on the first key Seek hits, but I'm assuming if Seek hit anything then TBI thinks there's something to look at so we might as well do our iteration. I just wanted this super-cheap check to bail early on no-ops before getting into stats and whatnot, since I think in many use-cases we'll hit the edge cases of entire ranges of no-new-keys or all-new-keys.
Okay, I've reworked this into a new, standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 13 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @tbg)
pkg/roachpb/api.proto, line 290 at r3 (raw file):
RequestHeader header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true]; // StartTime specifies a lower-bound timestamp, which, if non-zero, causes
Do we still want this?
pkg/roachpb/api.proto, line 307 at r3 (raw file):
what the range since by
This sentence needs some love.
pkg/roachpb/api.proto, line 308 at r3 (raw file):
// it returns the MVCC-scanned values to TargetTime.
Does it?
pkg/roachpb/api.proto, line 316 at r3 (raw file):
// applicable to their usage and if so, take steps to avoid it. Some future work // could add the concept of a gc.revert_ttl which would prevent GC of a key that // otherwise meets the GC TTL if the key "shadowing" it is younger then the
younger than
pkg/roachpb/api.proto, line 318 at r3 (raw file):
// otherwise meets the GC TTL if the key "shadowing" it is younger then the // revert TTL -- i.e. ensuring keys remain revertable via this mechanism (by not // clearing their earlier revisions) for that duraton. When such a mechanism is
duration
pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):
Previously, dt (David Taylor) wrote…
Right -- I'm not using the timestamp on the first key Seek hits, but I'm assuming if Seek hit anything then TBI thinks there's something to look at so we might as well do our iteration. I just wanted this super-cheap check to bail early on no-ops before getting into stats and whatnot, since I think in many use-cases we'll hit the edge cases of entire ranges of no-new-keys or all-new-keys.
Yeah, I'm just saying that you should leave that in a comment. It's not immediately obvious.
pkg/storage/batcheval/cmd_revert_range.go, line 53 at r3 (raw file):
// RevertRange wipes all MVCC versions more recent than TargetTime of the keys // covered by the specified span, adjusting the MVCC stats accordingly.
I can't find where it was brought up, but I think @tbg just mentioned something about ClearRange having issues with replays because it doesn't place an upper bound on the timestamp that it clears, so it it is replayed in the future, it can re-clear keys. Is that correct?
Since this is always scanning every key in the range and performing a timestamp check, do we lose anything by using the batch header's timestamp as an upper bound to revert?
EDIT: it actually looks like we are doing this. We should document it in a few places then.
pkg/storage/batcheval/cmd_revert_range.go, line 73 at r3 (raw file):
clearStartTime := args.TargetTime.Next() if empty, err := isEmptyKeyTimeRange(batch, from, to, clearStartTime); err != nil || empty {
I'd add a log.VEventf
here as well.
pkg/storage/batcheval/cmd_revert_range.go, line 77 at r3 (raw file):
} log.VEventf(ctx, 2, "clearing keys with timestamp > %v", args.TargetTime) statsBefore, err := computeStatsDelta(ctx, batch, cArgs, from, to)
I'm confused. Why are we calling this to get the stats before the modification instead of just using cArgs.EvalCtx.GetMVCCStats()
directly? computeStatsDelta
looks very tailored to ClearRange
, which expects to clear all keys in its key range.
pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):
} err = engine.MVCCClearTimeRange(ctx, batch, from, to, clearStartTime, cArgs.Header.Timestamp)
Weren't we going to hook this up to cArgs.MaxKeys
? You can take a look at cmd_delete_range.go
to see how that's handled there.
pkg/storage/engine/mvcc.go, line 1848 at r3 (raw file):
// MVCCClearTimeRange clears all MVCC versions within the span [key, endKey) // which have timestamps in the span [startTime, endTime). This can have the
Why is endTime
exclusive? If a write exists at this time, we don't want to clear it?
pkg/storage/engine/mvcc.go, line 1857 at r3 (raw file):
// does or not not change the live and gc keys) so the caller is responsible for // recomputing stats over the resulting span if needed. func MVCCClearTimeRange(
This could use some targetted unit testing in mvcc_test.go
as well.
pkg/storage/engine/mvcc.go, line 1883 at r3 (raw file):
if meta.Txn != nil && !hlc.Timestamp(meta.Timestamp).Less(startTime) { err := &roachpb.WriteIntentError{ Intents: []roachpb.Intent{{Span: roachpb.Span{Key: append([]byte{}, kv.Key.Key...)},
Where did you find that MVCCKeyValue.Key
is unsafe after the batch.Iterate
continues? I'm having trouble finding an answer to that one way or another.
EDIT: I actually think it is safe. It's created using Iterator.Key
, not Iterator.UnsafeKey
.
pkg/storage/engine/mvcc.go, line 1901 at r3 (raw file):
return false, err } cleared++
Were you meaning to add in pagination?
1f3c975
to
7e23fc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/roachpb/api.proto, line 290 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we still want this?
Done.
pkg/roachpb/api.proto, line 307 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
what the range since by
This sentence needs some love.
Done.
pkg/roachpb/api.proto, line 308 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
// it returns the MVCC-scanned values to TargetTime.
Does it?
Done.
pkg/roachpb/api.proto, line 316 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
younger than
Done.
pkg/roachpb/api.proto, line 318 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
duration
Done.
pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yeah, I'm just saying that you should leave that in a comment. It's not immediately obvious.
Done.
pkg/storage/batcheval/cmd_revert_range.go, line 53 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I can't find where it was brought up, but I think @tbg just mentioned something about ClearRange having issues with replays because it doesn't place an upper bound on the timestamp that it clears, so it it is replayed in the future, it can re-clear keys. Is that correct?
Since this is always scanning every key in the range and performing a timestamp check, do we lose anything by using the batch header's timestamp as an upper bound to revert?
EDIT: it actually looks like we are doing this. We should document it in a few places then.
Done.
pkg/storage/batcheval/cmd_revert_range.go, line 73 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'd add a
log.VEventf
here as well.
Done.
pkg/storage/batcheval/cmd_revert_range.go, line 77 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm confused. Why are we calling this to get the stats before the modification instead of just using
cArgs.EvalCtx.GetMVCCStats()
directly?computeStatsDelta
looks very tailored toClearRange
, which expects to clear all keys in its key range.
I was using computeStatsDelta
here since it handles using cArgs.EvalCtx.GetMVCCStats()
iff the span is the whole range and computing the stats for the span if not, which is basically exactly what I think I need. I'm fine with it returning stats for all keys in that range, because that's what I subtract, and then I add back in the evaluated stats for the post-time-bound-clear'ed span. I added a couple comments.
pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Weren't we going to hook this up to
cArgs.MaxKeys
? You can take a look atcmd_delete_range.go
to see how that's handled there.
I thought we were okay with punting pagination to a follow up?
pkg/storage/engine/mvcc.go, line 1848 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why is
endTime
exclusive? If a write exists at this time, we don't want to clear it?
Hm. Good point. It just seemed like one of them wanted to be exclusive but yeah, endTime makes that a bit awkward. I switched startTime being exclusive, since I was just .Next()ing it in the current call anyway.
pkg/storage/engine/mvcc.go, line 1857 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This could use some targetted unit testing in
mvcc_test.go
as well.
Done.
pkg/storage/engine/mvcc.go, line 1883 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Where did you find that
MVCCKeyValue.Key
is unsafe after thebatch.Iterate
continues? I'm having trouble finding an answer to that one way or another.EDIT: I actually think it is safe. It's created using
Iterator.Key
, notIterator.UnsafeKey
.
TIL. I just assumed it was unsafe since the alternative sounded too expensive to be true. I'll add a TODO to switch to my own iterator instead so I can avoid all that allocating when I'm just skipping over large swaths of keys.
pkg/storage/engine/mvcc.go, line 1901 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Were you meaning to add in pagination?
I thought we were punting it to a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost there. Thanks for sticking through with the back and forth.
Reviewed 6 of 7 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @tbg)
pkg/storage/batcheval/cmd_revert_range.go, line 77 at r3 (raw file):
Previously, dt (David Taylor) wrote…
I was using
computeStatsDelta
here since it handles usingcArgs.EvalCtx.GetMVCCStats()
iff the span is the whole range and computing the stats for the span if not, which is basically exactly what I think I need. I'm fine with it returning stats for all keys in that range, because that's what I subtract, and then I add back in the evaluated stats for the post-time-bound-clear'ed span. I added a couple comments.
Oh I see, because you then scan the same key range down below. Ok, carry on.
pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):
Previously, dt (David Taylor) wrote…
I thought we were okay with punting pagination to a follow up?
To be honest, I think that's going to create more work than if you just add support for it now. You'll then need to deal with a migration and version checks. I'm fine leaving it out of this PR, but I'd suggest we add support for it before any release (alpha or otherwise) goes out and forces us to worry about incompatibility.
pkg/storage/batcheval/cmd_revert_range.go, line 43 at r4 (raw file):
) (bool, error) { // Use a TBI to check if there is anyting to delete -- the first key Seek hits // may likely not be in the time range but the fact the TBI returned indicates
s/returned/returned any key/
pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):
// key and acts accordingly. iter := batch.NewIterator(engine.IterOptions{ LowerBound: from, UpperBound: to, MinTimestampHint: since.Next(),
Add a comment about why you're calling since.Next()
.
pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):
// key and acts accordingly. iter := batch.NewIterator(engine.IterOptions{ LowerBound: from, UpperBound: to, MinTimestampHint: since.Next(),
We might as well pass a MaxTimestampHint
here as well, right?
pkg/storage/engine/mvcc.go, line 1883 at r3 (raw file):
Previously, dt (David Taylor) wrote…
TIL. I just assumed it was unsafe since the alternative sounded too expensive to be true. I'll add a TODO to switch to my own iterator instead so I can avoid all that allocating when I'm just skipping over large swaths of keys.
👍
pkg/storage/engine/mvcc_test.go, line 2408 at r4 (raw file):
} ts5 := hlc.Timestamp{WallTime: 50}
Do you mind diagramming in a comment the keys that we have laid down? That will make this test much easier to read. Something like:
// Keyspace:
//
// 50 -
// |
// 40 - v4 v4
// |
// 30 - v3
// time |
// 20 - v2 v2
// |
// 10 - v1
// |
// 0 -----------------------
// k1 k2 k3 k4 k5
// keys
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
assertKVs(t, e, ts4, ts4Content) assertKVs(t, e, ts5, ts4Content) })
We're missing tests where we don't scan from [keyMin, keyMax)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
To be honest, I think that's going to create more work than if you just add support for it now. You'll then need to deal with a migration and version checks. I'm fine leaving it out of this PR, but I'd suggest we add support for it before any release (alpha or otherwise) goes out and forces us to worry about incompatibility.
I think I'd prefer to fast-follow with it rather than let this PR grow more, since I think we'll also need some bigger tests to test the interaction with contagious span clears (which presumably only add 1 to the batch size).
pkg/storage/batcheval/cmd_revert_range.go, line 43 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/returned/returned any key/
Done.
pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a comment about why you're calling
since.Next()
.
Done.
pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We might as well pass a
MaxTimestampHint
here as well, right?
Done.
pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):
MVCCKey{Key: key}, MVCCKey{Key: endKey}, func(kv MVCCKeyValue) (bool, error) { if skippingIntent {
@tbg made a comment on slack that got me worried here, asking if when we push if the timestamp is updated in both kvs or just in the intent. I went ahead and added state to the loop is when we decide not to error on an intent (by deciding it does not fall within our time-bounds) we just skip the following kv without looking at it, instead of assuming that looking at it would come to the same conclusion about the time-bounds.
pkg/storage/engine/mvcc_test.go, line 2408 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you mind diagramming in a comment the keys that we have laid down? That will make this test much easier to read. Something like:
// Keyspace: // // 50 - // | // 40 - v4 v4 // | // 30 - v3 // time | // 20 - v2 v2 // | // 10 - v1 // | // 0 ----------------------- // k1 k2 k3 k4 k5 // keys
Nice! Done (moved v3 over to the k1 col -- k3 and k4 never get writes).
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're missing tests where we don't scan from
[keyMin, keyMax)
.
Done. Also added a few around intents here, since that was only being tested at the command level before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):
Previously, dt (David Taylor) wrote…
@tbg made a comment on slack that got me worried here, asking if when we push if the timestamp is updated in both kvs or just in the intent. I went ahead and added state to the loop is when we decide not to error on an intent (by deciding it does not fall within our time-bounds) we just skip the following kv without looking at it, instead of assuming that looking at it would come to the same conclusion about the time-bounds.
Based on slack convo, maybe this wasn't an issue after all since meta.Timestamp is bumped to match the kv ts so it was fine before letting the per-kv check skip it? I'm getting a little lost to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at MVCCClearTimeRange
closely, since I haven't been following the discussion on intents, but otherwise LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/cmd_revert_range.go, line 77 at r6 (raw file):
batch, args.Key, args.EndKey, args.TargetTime, cArgs.Header.Timestamp, ); err != nil || empty { log.VEventf(ctx, 2, "no keys to clear in specified time range")
Should we log something else if isEmptyTimeRange
returns an error?
pkg/storage/batcheval/cmd_revert_range_test.go, line 156 at r6 (raw file):
sumCIntent := hashRange(t, eng, startKey, endKey) // Lay down more revisions (skipping even keys to avoid out intent on 0012).
our
?
pkg/storage/engine/mvcc.go, line 1899 at r6 (raw file):
} } // TODO(dt): buffer keys and flush large runs a single ClearRange. In many
nit: is this missing a word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):
Previously, dt (David Taylor) wrote…
I think I'd prefer to fast-follow with it rather than let this PR grow more, since I think we'll also need some bigger tests to test the interaction with contagious span clears (which presumably only add 1 to the batch size).
👍
pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):
Previously, dt (David Taylor) wrote…
Based on slack convo, maybe this wasn't an issue after all since meta.Timestamp is bumped to match the kv ts so it was fine before letting the per-kv check skip it? I'm getting a little lost to be honest.
I don't think we need this logic at all, because if the meta.Timestamp
is above our search timestamp, the intent's version key must also be.
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
Previously, dt (David Taylor) wrote…
Done. Also added a few around intents here, since that was only being tested at the command level before.
Thanks for adding intent cases. Shouldn't we be getting a WriteIntentError
in some of them though?
3a1a985
to
4b49510
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/cmd_revert_range.go, line 77 at r6 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
Should we log something else if
isEmptyTimeRange
returns an error?
Sure. I was thinking it wasn't worth it since we're returning the actual error, but I can make the log message only occur in the nil error case.
pkg/storage/batcheval/cmd_revert_range_test.go, line 156 at r6 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
our
?
Done.
pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't think we need this logic at all, because if the
meta.Timestamp
is above our search timestamp, the intent's version key must also be.
Ok, put it back the way it was (check for and error on an intent, then just check ts on every versioned key -- potentially including the one owned by the intent. If/when we switch to our own iterator, we could consider seeking past the intent's time but I doubt it is worth it given how much slower Seek is?
pkg/storage/engine/mvcc.go, line 1899 at r6 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit: is this missing a word?
Done.
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks for adding intent cases. Shouldn't we be getting a
WriteIntentError
in some of them though?
we do: the first two both are require.Error
.
This adds a new RPC, similar to ClearRange except with the addition of a time-range filter, for clearing revisions of keys within the passed span that fall within the passed time range, effectively "reverting" those writes. Note however that this isn't a quite a true, generalized "revert" op yet since older revisions of some keys might have been cleared by GC in the meantime, meaning that using this to clear the newer revisions does not cause a scan to observe the older values (but instead that scan would now see no value). A future change could introduce a "revert window" setting like the GC TTL that controls when a key can be GC'ed, in this case only allowing GC of a key when the key covering it is older than the revert window (in addition to the key itself being older than the existing GC TTL). This could then ensure that this RPC would always be able to revert the range to any time in the revert window at which point it could include a check that the requested time was valid. However that is left to a future change. The immediate need for this RPC is rolling back a failed partial bulk ingestion during IMPORT INTO, during which only new keys are added and might need to be reverted, so there are no prior revisions to worry about. Release note: none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the last comment thread is resolved.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):
Previously, dt (David Taylor) wrote…
Ok, put it back the way it was (check for and error on an intent, then just check ts on every versioned key -- potentially including the one owned by the intent. If/when we switch to our own iterator, we could consider seeking past the intent's time but I doubt it is worth it given how much slower Seek is?
Yeah, it's almost certainly not worth it because there should never be any keys between the meta and the version key.
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
Previously, dt (David Taylor) wrote…
we do: the first two both are
require.Error
.
I'm not seeing that. I only see it for the MVCCScan. Am I missing something?
Regardless, I'd either use require.EqualError
or add require.IsType
calls to check that we're getting WriteIntentError
s back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm not seeing that. I only see it for the MVCCScan. Am I missing something?
Regardless, I'd either use
require.EqualError
or addrequire.IsType
calls to check that we're gettingWriteIntentError
s back.
The first two intent subtests, clear everything hitting intent fails
and clear exactly hitting intent fails
just had require.Error(t, MVCCClearTim...)
with no Scans at all (since the Clear failed instead of doing anything). In any case, I changed them to require.EqualError
with the actual intent error now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)
pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):
Previously, dt (David Taylor) wrote…
The first two intent subtests,
clear everything hitting intent fails
andclear exactly hitting intent fails
just hadrequire.Error(t, MVCCClearTim...)
with no Scans at all (since the Clear failed instead of doing anything). In any case, I changed them torequire.EqualError
with the actual intent error now.
Thanks.
bors r=nvanbenschoten |
38477: storage: add RevertRange r=nvanbenschoten a=dt This adds support for passing a lower time-bound to ClearRange. When set, this causes ClearRange to clear KVs in the specified range only if they have a timestamp greater than or equal to the passed time bound, which essentially turns ClearRange into a revert-to-time operator, since it would clear any keys written since then. This can then be used to rollback a failed bulk ingestion that occured at a given time (assuming no other writes were happening at that time). Release note: none. Co-authored-by: David Taylor <[email protected]>
Build succeeded |
This adds support for passing a lower time-bound to ClearRange.
When set, this causes ClearRange to clear KVs in the specified range only if they have a timestamp greater than or equal to the passed time bound, which essentially turns ClearRange into a revert-to-time operator, since it would clear any keys written since then.
This can then be used to rollback a failed bulk ingestion that occured at a given time (assuming no other writes were happening at that time).
Release note: none.