Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add RevertRange #38477

Merged
merged 1 commit into from
Jul 22, 2019
Merged

storage: add RevertRange #38477

merged 1 commit into from
Jul 22, 2019

Conversation

dt
Copy link
Member

@dt dt commented Jun 26, 2019

This adds support for passing a lower time-bound to ClearRange.

When set, this causes ClearRange to clear KVs in the specified range only if they have a timestamp greater than or equal to the passed time bound, which essentially turns ClearRange into a revert-to-time operator, since it would clear any keys written since then.

This can then be used to rollback a failed bulk ingestion that occured at a given time (assuming no other writes were happening at that time).

Release note: none.

@dt dt requested review from tbg, nvanbenschoten and a team June 26, 2019 22:18
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @nvanbenschoten, and @tbg)


pkg/roachpb/api.proto, line 290 at r1 (raw file):

  RequestHeader header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true];

  util.hlc.Timestamp start_time = 2 [(gogoproto.nullable) = false];

Give this a comment. Make sure to be explicit about whether this is an inclusive or exclusive lower bound. IIRC we're not very consistent about that when it comes to time boundaries looking backwards.


pkg/storage/batcheval/cmd_clear_range.go, line 103 at r1 (raw file):

				return result.Result{}, err
			}
			*cArgs.Stats = stats

cArgs.Stats is a delta, not an absolute value. See cmd_recompute_stats.go.


pkg/storage/batcheval/cmd_clear_range_test.go, line 170 at r1 (raw file):

		t.Fatal(err)
	}
	return h.Sum([]byte{})

nit: h.Sum(nil)?


pkg/storage/batcheval/cmd_clear_range_test.go, line 192 at r1 (raw file):

	before := hlc.Timestamp{WallTime: 1000}

	// lay down some keys to be the starting point to which we'll return later.

nit: these comments aren't in accordance with our style guide because they're not sentence cased.


pkg/storage/batcheval/cmd_clear_range_test.go, line 246 at r1 (raw file):

	}

	t.Run("revert to before", func(t *testing.T) {

revert to b?


pkg/storage/batcheval/cmd_clear_range_test.go, line 246 at r1 (raw file):

	}

	t.Run("revert to before", func(t *testing.T) {

I think it would be easy to pull each of these subtests into a helper/closure with just a few params to reduce duplication.


pkg/storage/batcheval/cmd_clear_range_test.go, line 262 at r1 (raw file):

			StartTime: cutoff,
		}
		cArgs.Stats = &enginepb.MVCCStats{}

Given that I think we're getting the stats wrong right now, I think it would be worth testing them in each of these subtests.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not reviewing this yet, but now that @nvanbenschoten is giving you things to do I should write my concerns down:

  1. this potentially creates very large batches, so it's an operation like DeleteRange that needs to be paginated explicitly by the caller (esp. since we want to bump the range size default up over time), or some magical resume key magic needs to happen. ClearRange doesn't have this problem because it just puts down a range tombstone.
  2. the above gives some evidence that this should be its own RPC instead
  3. How does this interact with GCThreshold? There isn't an issue for ClearRange (or maybe it is) because that keyspace was never supposed to be touched again. But maybe it's not an issue here? I can't tell because there isn't enough detail on how this will be used. Clients won't touch this keyspace until the bulk ingestion is done, right? At which point it has better data at a newer timestamp so things should work out.

Also, this is a one-off question but since we're both here -- when I restore a backup, do I also restore a GCThreshold? That is necessary for correctness (or folks can run broken reads in the far past that was gc'ed before backup was taken).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @tbg)

@dt
Copy link
Member Author

dt commented Jul 1, 2019

this potentially creates very large batches, so it's an operation like DeleteRange that needs to be paginated explicitly by the caller (esp. since we want to bump the range size default up over time), or some magical resume key magic needs to happen. ClearRange doesn't have this problem because it just puts down a range tombstone.

Based on my current expected client call, I'd rather not push pagination all the way back to the client (or maybe I need to rethink my usage). My expected call is basically to make a single time-bound ClearRange call for the whole table prefix and let distsender handle fanning that out to every range in the table. Ideally the range would handle breaking that up as needed to execute it, so maybe a wrapper RPC (similar to how running an ImportRequest in turn runs some number of (usually local) AddSSTableRequests) that does the pagination locally on each range before returning?

How does this interact with GCThreshold? There isn't an issue for ClearRange (or maybe it is) because that keyspace was never supposed to be touched again. But maybe it's not an issue here? I can't tell because there isn't enough detail on how this will be used. Clients won't touch this keyspace until the bulk ingestion is done, right? At which point it has better data at a newer timestamp so things should work out.

I'm not sure I see the interaction with GCThreshold. GCThreshold is still maintained by GC as before.

Clear/ClearRange can and do rewrite history. In my current usage, I'll be doing this at a timestamp such that I assume SQL is has been locked out of reading, so I won't be changing the results of any served SQL reads, but I'm not seeing the direct interaction with GC yet.

Where I think there is interaction with GC is in AddSSTable: if I add a key that shadows an existing one which is older than GC TTL, it could be GC'ed at any time. If we then used this RPC to try to clear the SST's shadowing key, there would be nothing remaining under it. That however is a problem with AddSSTable, not this RPC -- Ben pointed out that the new keys should carry a flag saying they are provisional / subject to rollback that suspends GC until some followup RPC or expiration, and that would then update some range state suspending GC. But that's all for AddSSTable to handle -- I don't see the interaction with GC in this one?

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the range would handle breaking that up as needed to execute it, so maybe a wrapper RPC (similar to how running an ImportRequest in turn runs some number of (usually local) AddSSTableRequests) that does the pagination locally on each range before returning?

This is a reasonable expectation. We can't do this in a general fashion for nontransactional requests (the contract is that a command is either all-or-nothing) so some wrapper request would have to be created as you suggested. But at the same time you'll need to introduce pagination into ClearRange (so that the wrapper can paginate) which doesn't make sense for the original ClearRange, so it still seems better to make a new RPC that's closer to DeleteRange in that it allows paginating via the StartKey, ResumeSpan, and maximum number of affected entries (or rather a size quota, which makes more sense here). BTW the fanout in DistSender has been causing problems for very expensive requests so it's not appropriate for something like this. The wrapper RPC solves this too because the pagination introduces throttling.

Finding good names will be awkward, the wrapper can't be named the same as the base RPC (RevertRange)?

I'm not sure I see the interaction with GCThreshold. GCThreshold is still maintained by GC as before.

The GCThreshold prevents reads of keyspace that has been altered in a way that doesn't preserve MVCC semantics. If you had previously served reads on versions that you are reverting, that would create problems that would be solved by bumping the GCThreshold in the process. But I think the contract that you want doesn't need this? I.e. you know that nobody has ever seen any of the data you are reverting, that is, nobody has ever accessed that table with a timestamp >= start_timestamp. Is that all correct? (Please also add that into an appropriate comment somewhere).

Also, this is a one-off question but since we're both here -- when I restore a backup, do I also restore a GCThreshold?

Not sure if you saw this, I'm still curious. If we take a backup, we don't allow historical reads before the GC threshold at which the backup was taken, right? Sorry for the off-topic, just making sure.

I can't tell because there isn't enough detail on how this will be used.

Mind giving this PR (or better yet, also a comment on the analogue of ImportRequest)? I suspect you're looking into importing into existing tables in the following manner:

  1. lock the table to clients, procure a safe timestamp T which we know hasn't been observed by anyone
  2. import into the table
  3. on failure, use this PR to clean up
  4. on success, unlock the table again

This gives a lot of context about what the command should and should not do, so I'd appreciate it being explicitly verbalized.

@nvanbenschoten you reviewed this PR as if you didn't have any broader concerns, feel free to chime in here so @dt doesn't get wagged around between the two of us

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt)


pkg/storage/batcheval/cmd_clear_range.go, line 87 at r1 (raw file):

			from, to,
			func(kv engine.MVCCKeyValue) (bool, error) {
				if !kv.Key.Timestamp.Less(args.StartTime) {

Is this going to handle intents correctly? I suspect not (also because it doesn't seem to be tested). I also think that you cannot support intents. This is because if you see an intent at timestamp 10 and say you're reverting everything >= 2, you don't know whether you have to remove the intent. This is because the intent could be committed at timestamp >= 12 even though at the moment it sits at 10. So maybe what you need to do here is to do the same thing that BACKUP does, which is bail when there's an intent (WriteIntentError) similar to how a read would.

I'd rather see a specialized method for this that lives in engine (MVCCRevertRange?) and gets a solid amount of unit testing. Your TODO below would also be addressed there more naturally, see the various updateStatsOnX` methods.

@dt
Copy link
Member Author

dt commented Jul 1, 2019

another thing that occurred to me on the train this morning: we might want a special case where if every key the iterator encounters meets the predicate, instead of sending the batch of all the individual clears to rocks we instead just send a regular range deletion over that key span.

@dt
Copy link
Member Author

dt commented Jul 2, 2019

@tbg RESTORE does not change the gcthreshold on any ranges or mess with any range metadata: it just runs AdminSplit to make a range, sends it one or more ImportRequests and then those send itself some AddSSTableRequests.

Currently the SSTs constructed by ImportRequest keep the timestamps on the keys in the backup SSTs (though @danhhz and I have discussed changing that every so often for years now). I think this means the GCThreshold depends on the ranges off of which we split the first range? I think that means usually if you're restoring an old BACKUP you probably can't time-travel on the restored data. If you restore very new backup you might be able to. RESTORE'ing will absolutely change the result returned by some scan request of some the target key range at the same time X before the restore, but that's considered OK given that we say it operates on dead key space.

We've thrown around moving all the timestamps of the keys in the backup to the ingestion time since we rewrite all the keys anyway, so that then historical reads from some time X before the RESTORE would still return empty as they did before and the whole table would appear to spring into life all at once at the restore time... but so far we've decided that we couldn't find a compelling reason to do that given the assumptions about SQL (not) reading those keys, and leaving the original timestamps there could be nice if SQL ever surfaces some form of modified-time backed by mvcc.

@nvanbenschoten
Copy link
Member

I like where we're going with this RevertRange request idea. It sounds to me like reverting a keyspace to a snapshot from a previous MVCC timestamp is exactly the semantics that we want here. Clarifying that might help answer a few questions about the interaction of this request with GC. It also reveals that an RPC like this would be more generally useful if done correctly. For instance, I think it would be what we need to implement functionality like https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/.

One option for ensuring that this works correctly with the GC is to introduce a new per-Range (or not) "rollback window" that parallels the current GCPolicy.TTLSeconds. The GC would then be instructed to avoid using any values in this rollback window when deciding whether to collect historic values. In other words, the GC would always ensure that if all values later than now() - rollback_window were suddenly removed, its GC decisions up to that point would still be valid. We would then enforce that RevertRange could only be run with a timestamp in this time window.

Imports into existing tables would set this window to an arbitrarily high value to effectively disable GC of its keyspace. Users of CRDB could also set this window to some reasonable value (1 hour) to ensure that they have the option to rollback any updates over some time period without risk of data loss.

@dt
Copy link
Member Author

dt commented Jul 8, 2019

This sounds like a good core feature, but I think the scope is growing a bit beyond what Bulk-IO would usually send your way. Is there a smaller-scope version we could start with to make progress incremental IMPORT and document a known-limitation or other workaround here for now, and come back to make this fancier RevertRange RPC later?

@nvanbenschoten
Copy link
Member

I think the next simplest version of this incremental IMPORT / RevertRange interaction is one that disables the GC on the entire table before beginning the IMPORT and reverts the GC change when the IMPORT is complete. Past that point, anything simpler will risk data loss in the case where an IMPORT is rolled back.

@tbg do you see any other intermediate options here?

@dt
Copy link
Member Author

dt commented Jul 8, 2019

v1 of IMPORT INTO is not allowed to replace an existing key, so for now there's no concern about GC having nuked a earlier, shadowed revision where the shadower is then reverted, so I don't think that we currently need to address the general revert-ability preservation. We will if/when when we use RevertRange or whatever on arbitrary data, or if we allow IMPORT INTO to replace rows, but right now, for the current intended usage, I don't think we need it.

@tbg
Copy link
Member

tbg commented Jul 9, 2019

v1 of IMPORT INTO is not allowed to replace an existing key

Then it does seem reasonable to introduce RevertRange without any special provisions for GC. I'd prefer it that way too, introducing RevertRange alone is going to be a nice chunk of focused work at the MVCC layer, so if we can do that first and worry about the GC later, that's good.

PS I had to look around and eventually find #26834. Please mention in the commit message what the "v1" does and how this PR fits in. It took me some time to reverse engineer all of that.

Also, for the v1 (in which incoming data never touches existing rows), how do you verify that it doesn't? You are planning to just throw SSTs into KV, right? Are you scanning the keys touched by the SST during evaluation to make sure nothing overlaps?

@tbg
Copy link
Member

tbg commented Jul 9, 2019

^- I just found #38449, which answers my last question.

@dt
Copy link
Member Author

dt commented Jul 11, 2019

Uh, what GitHub?

@dt dt reopened this Jul 11, 2019
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/roachpb/api.proto, line 290 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this a comment. Make sure to be explicit about whether this is an inclusive or exclusive lower bound. IIRC we're not very consistent about that when it comes to time boundaries looking backwards.

Done.


pkg/storage/batcheval/cmd_clear_range.go, line 87 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is this going to handle intents correctly? I suspect not (also because it doesn't seem to be tested). I also think that you cannot support intents. This is because if you see an intent at timestamp 10 and say you're reverting everything >= 2, you don't know whether you have to remove the intent. This is because the intent could be committed at timestamp >= 12 even though at the moment it sits at 10. So maybe what you need to do here is to do the same thing that BACKUP does, which is bail when there's an intent (WriteIntentError) similar to how a read would.

I'd rather see a specialized method for this that lives in engine (MVCCRevertRange?) and gets a solid amount of unit testing. Your TODO below would also be addressed there more naturally, see the various updateStatsOnX` methods.

good catch -- added an error on intents.

I moved most of the logic to engine (though doing the stats part is beyond what I think I can take on so that is still done in the batch eval).


pkg/storage/batcheval/cmd_clear_range.go, line 103 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

cArgs.Stats is a delta, not an absolute value. See cmd_recompute_stats.go.

Done.


pkg/storage/batcheval/cmd_clear_range_test.go, line 170 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: h.Sum(nil)?

Done.


pkg/storage/batcheval/cmd_clear_range_test.go, line 246 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

revert to b?

Done.


pkg/storage/batcheval/cmd_clear_range_test.go, line 262 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Given that I think we're getting the stats wrong right now, I think it would be worth testing them in each of these subtests.

Done.

@dt dt force-pushed the revert branch 2 times, most recently from 1721a14 to a82bed3 Compare July 15, 2019 20:27
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it does seem reasonable to introduce RevertRange without any special provisions for GC. I'd prefer it that way too, introducing RevertRange alone is going to be a nice chunk of focused work at the MVCC layer, so if we can do that first and worry about the GC later, that's good.

I agree with this. It seems like we can introduce RevertRange now and polish off the request's interface even if the implementation still has a few gaps.

It sounds like the question of whether or not we paginate these requests is still an issue as well, right @dt? If we were to introduce a new request type (i.e. RevertRange) then we could hook it up to the resume span infrastructure from the beginning, which would allow for pagination to be handled in a fairly straightforward manner on the client (see deleteRangeNode for an example). This would prevent the request from being parallelized by DistSender, but it wouldn't prevent DistSender from fanning out like you're hoping. Is this insufficient?

Reviewed 4 of 7 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @tbg)


pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):

	batch engine.ReadWriter, from, to engine.MVCCKey, since hlc.Timestamp,
) (bool, error) {
	// Use a TBI to check if there is anyting to delete.

I'd point out that the MinTimestampHint is just a hint, so not all keys that we see here are actually >= since, however, we know that if we see any keys, that at least some are >= since, otherwise we would have ignored those entire SSTs.

@dt dt changed the title storage: add time-bound ClearRange storage: add RevertRange Jul 16, 2019
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd point out that the MinTimestampHint is just a hint, so not all keys that we see here are actually >= since, however, we know that if we see any keys, that at least some are >= since, otherwise we would have ignored those entire SSTs.

Right -- I'm not using the timestamp on the first key Seek hits, but I'm assuming if Seek hit anything then TBI thinks there's something to look at so we might as well do our iteration. I just wanted this super-cheap check to bail early on no-ops before getting into stats and whatnot, since I think in many use-cases we'll hit the edge cases of entire ranges of no-new-keys or all-new-keys.

@dt
Copy link
Member Author

dt commented Jul 16, 2019

Okay, I've reworked this into a new, standalone RevertRange RPC but I'm bordering on some unfamiliar territory here so I've probably missed some things

@dt dt requested a review from thoszhang July 17, 2019 19:33
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 13 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @tbg)


pkg/roachpb/api.proto, line 290 at r3 (raw file):

  RequestHeader header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true];

  // StartTime specifies a lower-bound timestamp, which, if non-zero, causes

Do we still want this?


pkg/roachpb/api.proto, line 307 at r3 (raw file):

what the range since by

This sentence needs some love.


pkg/roachpb/api.proto, line 308 at r3 (raw file):

// it returns the MVCC-scanned values to TargetTime.

Does it?


pkg/roachpb/api.proto, line 316 at r3 (raw file):

// applicable to their usage and if so, take steps to avoid it. Some future work
// could add the concept of a gc.revert_ttl which would prevent GC of a key that
// otherwise meets the GC TTL if the key "shadowing" it is younger then the

younger than


pkg/roachpb/api.proto, line 318 at r3 (raw file):

// otherwise meets the GC TTL if the key "shadowing" it is younger then the
// revert TTL -- i.e. ensuring keys remain revertable via this mechanism (by not
// clearing their earlier revisions) for that duraton. When such a mechanism is

duration


pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):

Previously, dt (David Taylor) wrote…

Right -- I'm not using the timestamp on the first key Seek hits, but I'm assuming if Seek hit anything then TBI thinks there's something to look at so we might as well do our iteration. I just wanted this super-cheap check to bail early on no-ops before getting into stats and whatnot, since I think in many use-cases we'll hit the edge cases of entire ranges of no-new-keys or all-new-keys.

Yeah, I'm just saying that you should leave that in a comment. It's not immediately obvious.


pkg/storage/batcheval/cmd_revert_range.go, line 53 at r3 (raw file):

// RevertRange wipes all MVCC versions more recent than TargetTime of the keys
// covered by the specified span, adjusting the MVCC stats accordingly.

I can't find where it was brought up, but I think @tbg just mentioned something about ClearRange having issues with replays because it doesn't place an upper bound on the timestamp that it clears, so it it is replayed in the future, it can re-clear keys. Is that correct?

Since this is always scanning every key in the range and performing a timestamp check, do we lose anything by using the batch header's timestamp as an upper bound to revert?

EDIT: it actually looks like we are doing this. We should document it in a few places then.


pkg/storage/batcheval/cmd_revert_range.go, line 73 at r3 (raw file):

	clearStartTime := args.TargetTime.Next()

	if empty, err := isEmptyKeyTimeRange(batch, from, to, clearStartTime); err != nil || empty {

I'd add a log.VEventf here as well.


pkg/storage/batcheval/cmd_revert_range.go, line 77 at r3 (raw file):

	}
	log.VEventf(ctx, 2, "clearing keys with timestamp > %v", args.TargetTime)
	statsBefore, err := computeStatsDelta(ctx, batch, cArgs, from, to)

I'm confused. Why are we calling this to get the stats before the modification instead of just using cArgs.EvalCtx.GetMVCCStats() directly? computeStatsDelta looks very tailored to ClearRange, which expects to clear all keys in its key range.


pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):

	}

	err = engine.MVCCClearTimeRange(ctx, batch, from, to, clearStartTime, cArgs.Header.Timestamp)

Weren't we going to hook this up to cArgs.MaxKeys? You can take a look at cmd_delete_range.go to see how that's handled there.


pkg/storage/engine/mvcc.go, line 1848 at r3 (raw file):

// MVCCClearTimeRange clears all MVCC versions within the span [key, endKey)
// which have timestamps in the span [startTime, endTime). This can have the

Why is endTimeexclusive? If a write exists at this time, we don't want to clear it?


pkg/storage/engine/mvcc.go, line 1857 at r3 (raw file):

// does or not not change the live and gc keys) so the caller is responsible for
// recomputing stats over the resulting span if needed.
func MVCCClearTimeRange(

This could use some targetted unit testing in mvcc_test.go as well.


pkg/storage/engine/mvcc.go, line 1883 at r3 (raw file):

				if meta.Txn != nil && !hlc.Timestamp(meta.Timestamp).Less(startTime) {
					err := &roachpb.WriteIntentError{
						Intents: []roachpb.Intent{{Span: roachpb.Span{Key: append([]byte{}, kv.Key.Key...)},

Where did you find that MVCCKeyValue.Key is unsafe after the batch.Iterate continues? I'm having trouble finding an answer to that one way or another.

EDIT: I actually think it is safe. It's created using Iterator.Key, not Iterator.UnsafeKey.


pkg/storage/engine/mvcc.go, line 1901 at r3 (raw file):

					return false, err
				}
				cleared++

Were you meaning to add in pagination?

@dt dt force-pushed the revert branch 2 times, most recently from 1f3c975 to 7e23fc5 Compare July 18, 2019 18:59
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/roachpb/api.proto, line 290 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we still want this?

Done.


pkg/roachpb/api.proto, line 307 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
what the range since by

This sentence needs some love.

Done.


pkg/roachpb/api.proto, line 308 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
// it returns the MVCC-scanned values to TargetTime.

Does it?

Done.


pkg/roachpb/api.proto, line 316 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

younger than

Done.


pkg/roachpb/api.proto, line 318 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

duration

Done.


pkg/storage/batcheval/cmd_clear_range.go, line 52 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yeah, I'm just saying that you should leave that in a comment. It's not immediately obvious.

Done.


pkg/storage/batcheval/cmd_revert_range.go, line 53 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I can't find where it was brought up, but I think @tbg just mentioned something about ClearRange having issues with replays because it doesn't place an upper bound on the timestamp that it clears, so it it is replayed in the future, it can re-clear keys. Is that correct?

Since this is always scanning every key in the range and performing a timestamp check, do we lose anything by using the batch header's timestamp as an upper bound to revert?

EDIT: it actually looks like we are doing this. We should document it in a few places then.

Done.


pkg/storage/batcheval/cmd_revert_range.go, line 73 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd add a log.VEventf here as well.

Done.


pkg/storage/batcheval/cmd_revert_range.go, line 77 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm confused. Why are we calling this to get the stats before the modification instead of just using cArgs.EvalCtx.GetMVCCStats() directly? computeStatsDelta looks very tailored to ClearRange, which expects to clear all keys in its key range.

I was using computeStatsDelta here since it handles using cArgs.EvalCtx.GetMVCCStats() iff the span is the whole range and computing the stats for the span if not, which is basically exactly what I think I need. I'm fine with it returning stats for all keys in that range, because that's what I subtract, and then I add back in the evaluated stats for the post-time-bound-clear'ed span. I added a couple comments.


pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Weren't we going to hook this up to cArgs.MaxKeys? You can take a look at cmd_delete_range.go to see how that's handled there.

I thought we were okay with punting pagination to a follow up?


pkg/storage/engine/mvcc.go, line 1848 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why is endTimeexclusive? If a write exists at this time, we don't want to clear it?

Hm. Good point. It just seemed like one of them wanted to be exclusive but yeah, endTime makes that a bit awkward. I switched startTime being exclusive, since I was just .Next()ing it in the current call anyway.


pkg/storage/engine/mvcc.go, line 1857 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This could use some targetted unit testing in mvcc_test.go as well.

Done.


pkg/storage/engine/mvcc.go, line 1883 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Where did you find that MVCCKeyValue.Key is unsafe after the batch.Iterate continues? I'm having trouble finding an answer to that one way or another.

EDIT: I actually think it is safe. It's created using Iterator.Key, not Iterator.UnsafeKey.

TIL. I just assumed it was unsafe since the alternative sounded too expensive to be true. I'll add a TODO to switch to my own iterator instead so I can avoid all that allocating when I'm just skipping over large swaths of keys.


pkg/storage/engine/mvcc.go, line 1901 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Were you meaning to add in pagination?

I thought we were punting it to a follow up?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost there. Thanks for sticking through with the back and forth.

Reviewed 6 of 7 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @tbg)


pkg/storage/batcheval/cmd_revert_range.go, line 77 at r3 (raw file):

Previously, dt (David Taylor) wrote…

I was using computeStatsDelta here since it handles using cArgs.EvalCtx.GetMVCCStats() iff the span is the whole range and computing the stats for the span if not, which is basically exactly what I think I need. I'm fine with it returning stats for all keys in that range, because that's what I subtract, and then I add back in the evaluated stats for the post-time-bound-clear'ed span. I added a couple comments.

Oh I see, because you then scan the same key range down below. Ok, carry on.


pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):

Previously, dt (David Taylor) wrote…

I thought we were okay with punting pagination to a follow up?

To be honest, I think that's going to create more work than if you just add support for it now. You'll then need to deal with a migration and version checks. I'm fine leaving it out of this PR, but I'd suggest we add support for it before any release (alpha or otherwise) goes out and forces us to worry about incompatibility.


pkg/storage/batcheval/cmd_revert_range.go, line 43 at r4 (raw file):

) (bool, error) {
	// Use a TBI to check if there is anyting to delete -- the first key Seek hits
	// may likely not be in the time range but the fact the TBI returned indicates

s/returned/returned any key/


pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):

	// key and acts accordingly.
	iter := batch.NewIterator(engine.IterOptions{
		LowerBound: from, UpperBound: to, MinTimestampHint: since.Next(),

Add a comment about why you're calling since.Next().


pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):

	// key and acts accordingly.
	iter := batch.NewIterator(engine.IterOptions{
		LowerBound: from, UpperBound: to, MinTimestampHint: since.Next(),

We might as well pass a MaxTimestampHint here as well, right?


pkg/storage/engine/mvcc.go, line 1883 at r3 (raw file):

Previously, dt (David Taylor) wrote…

TIL. I just assumed it was unsafe since the alternative sounded too expensive to be true. I'll add a TODO to switch to my own iterator instead so I can avoid all that allocating when I'm just skipping over large swaths of keys.

👍


pkg/storage/engine/mvcc_test.go, line 2408 at r4 (raw file):

	}

	ts5 := hlc.Timestamp{WallTime: 50}

Do you mind diagramming in a comment the keys that we have laid down? That will make this test much easier to read. Something like:

// Keyspace:
//
//    50 -
//       |
//    40 -      v4          v4
//       |
//    30 -         v3
// time  |
//    20 -      v2          v2
//       |
//    10 -      v1
//       |
//     0 -----------------------
//          k1  k2  k3  k4  k5
//                 keys 

pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

		assertKVs(t, e, ts4, ts4Content)
		assertKVs(t, e, ts5, ts4Content)
	})

We're missing tests where we don't scan from [keyMin, keyMax).

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To be honest, I think that's going to create more work than if you just add support for it now. You'll then need to deal with a migration and version checks. I'm fine leaving it out of this PR, but I'd suggest we add support for it before any release (alpha or otherwise) goes out and forces us to worry about incompatibility.

I think I'd prefer to fast-follow with it rather than let this PR grow more, since I think we'll also need some bigger tests to test the interaction with contagious span clears (which presumably only add 1 to the batch size).


pkg/storage/batcheval/cmd_revert_range.go, line 43 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/returned/returned any key/

Done.


pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add a comment about why you're calling since.Next().

Done.


pkg/storage/batcheval/cmd_revert_range.go, line 48 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We might as well pass a MaxTimestampHint here as well, right?

Done.


pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):

		MVCCKey{Key: key}, MVCCKey{Key: endKey},
		func(kv MVCCKeyValue) (bool, error) {
			if skippingIntent {

@tbg made a comment on slack that got me worried here, asking if when we push if the timestamp is updated in both kvs or just in the intent. I went ahead and added state to the loop is when we decide not to error on an intent (by deciding it does not fall within our time-bounds) we just skip the following kv without looking at it, instead of assuming that looking at it would come to the same conclusion about the time-bounds.


pkg/storage/engine/mvcc_test.go, line 2408 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you mind diagramming in a comment the keys that we have laid down? That will make this test much easier to read. Something like:

// Keyspace:
//
//    50 -
//       |
//    40 -      v4          v4
//       |
//    30 -         v3
// time  |
//    20 -      v2          v2
//       |
//    10 -      v1
//       |
//     0 -----------------------
//          k1  k2  k3  k4  k5
//                 keys 

Nice! Done (moved v3 over to the k1 col -- k3 and k4 never get writes).


pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're missing tests where we don't scan from [keyMin, keyMax).

Done. Also added a few around intents here, since that was only being tested at the command level before.

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):

Previously, dt (David Taylor) wrote…

@tbg made a comment on slack that got me worried here, asking if when we push if the timestamp is updated in both kvs or just in the intent. I went ahead and added state to the loop is when we decide not to error on an intent (by deciding it does not fall within our time-bounds) we just skip the following kv without looking at it, instead of assuming that looking at it would come to the same conclusion about the time-bounds.

Based on slack convo, maybe this wasn't an issue after all since meta.Timestamp is bumped to match the kv ts so it was fine before letting the per-kv check skip it? I'm getting a little lost to be honest.

Copy link
Contributor

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at MVCCClearTimeRange closely, since I haven't been following the discussion on intents, but otherwise LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/batcheval/cmd_revert_range.go, line 77 at r6 (raw file):

		batch, args.Key, args.EndKey, args.TargetTime, cArgs.Header.Timestamp,
	); err != nil || empty {
		log.VEventf(ctx, 2, "no keys to clear in specified time range")

Should we log something else if isEmptyTimeRange returns an error?


pkg/storage/batcheval/cmd_revert_range_test.go, line 156 at r6 (raw file):

	sumCIntent := hashRange(t, eng, startKey, endKey)

	// Lay down more revisions (skipping even keys to avoid out intent on 0012).

our?


pkg/storage/engine/mvcc.go, line 1899 at r6 (raw file):

				}
			}
			// TODO(dt): buffer keys and flush large runs a single ClearRange. In many

nit: is this missing a word?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @nvanbenschoten, and @tbg)


pkg/storage/batcheval/cmd_revert_range.go, line 82 at r3 (raw file):

Previously, dt (David Taylor) wrote…

I think I'd prefer to fast-follow with it rather than let this PR grow more, since I think we'll also need some bigger tests to test the interaction with contagious span clears (which presumably only add 1 to the batch size).

👍


pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):

Previously, dt (David Taylor) wrote…

Based on slack convo, maybe this wasn't an issue after all since meta.Timestamp is bumped to match the kv ts so it was fine before letting the per-kv check skip it? I'm getting a little lost to be honest.

I don't think we need this logic at all, because if the meta.Timestamp is above our search timestamp, the intent's version key must also be.


pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

Previously, dt (David Taylor) wrote…

Done. Also added a few around intents here, since that was only being tested at the command level before.

Thanks for adding intent cases. Shouldn't we be getting a WriteIntentError in some of them though?

@dt dt force-pushed the revert branch 2 times, most recently from 3a1a985 to 4b49510 Compare July 22, 2019 19:31
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/batcheval/cmd_revert_range.go, line 77 at r6 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

Should we log something else if isEmptyTimeRange returns an error?

Sure. I was thinking it wasn't worth it since we're returning the actual error, but I can make the log message only occur in the nil error case.


pkg/storage/batcheval/cmd_revert_range_test.go, line 156 at r6 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

our?

Done.


pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think we need this logic at all, because if the meta.Timestamp is above our search timestamp, the intent's version key must also be.

Ok, put it back the way it was (check for and error on an intent, then just check ts on every versioned key -- potentially including the one owned by the intent. If/when we switch to our own iterator, we could consider seeking past the intent's time but I doubt it is worth it given how much slower Seek is?


pkg/storage/engine/mvcc.go, line 1899 at r6 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: is this missing a word?

Done.


pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for adding intent cases. Shouldn't we be getting a WriteIntentError in some of them though?

we do: the first two both are require.Error.

This adds a new RPC, similar to ClearRange except with the addition of
a time-range filter, for clearing revisions of keys within the passed
span that fall within the passed time range, effectively "reverting"
those writes.

Note however that this isn't a quite a true, generalized "revert" op yet
since older revisions of some keys might have been cleared by GC in the
meantime, meaning that using this to clear the newer revisions does not
cause a scan to observe the older values (but instead that scan would
now see no value). A future change could introduce a "revert window"
setting like the GC TTL that controls when a key can be GC'ed, in this
case only allowing GC of a key when the key covering it is older than
the revert window (in addition to the key itself being older than the
existing GC TTL). This could then ensure that this RPC would always be
able to revert the range to any time in the revert window at which point
it could include a check that the requested time was valid.

However that is left to a future change. The immediate need for this RPC
is rolling back a failed partial bulk ingestion during  IMPORT INTO,
during which only new keys are added and might need to be reverted, so
there are no prior revisions to worry about.

Release note: none.
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: once the last comment thread is resolved.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/engine/mvcc.go, line 1877 at r5 (raw file):

Previously, dt (David Taylor) wrote…

Ok, put it back the way it was (check for and error on an intent, then just check ts on every versioned key -- potentially including the one owned by the intent. If/when we switch to our own iterator, we could consider seeking past the intent's time but I doubt it is worth it given how much slower Seek is?

Yeah, it's almost certainly not worth it because there should never be any keys between the meta and the version key.


pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

Previously, dt (David Taylor) wrote…

we do: the first two both are require.Error.

I'm not seeing that. I only see it for the MVCCScan. Am I missing something?

Regardless, I'd either use require.EqualError or add require.IsType calls to check that we're getting WriteIntentErrors back.

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm not seeing that. I only see it for the MVCCScan. Am I missing something?

Regardless, I'd either use require.EqualError or add require.IsType calls to check that we're getting WriteIntentErrors back.

The first two intent subtests, clear everything hitting intent fails and clear exactly hitting intent fails just had require.Error(t, MVCCClearTim...) with no Scans at all (since the Clear failed instead of doing anything). In any case, I changed them to require.EqualError with the actual intent error now.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang, @nvanbenschoten, and @tbg)


pkg/storage/engine/mvcc_test.go, line 2465 at r4 (raw file):

Previously, dt (David Taylor) wrote…

The first two intent subtests, clear everything hitting intent fails and clear exactly hitting intent fails just had require.Error(t, MVCCClearTim...) with no Scans at all (since the Clear failed instead of doing anything). In any case, I changed them to require.EqualError with the actual intent error now.

Thanks.

@dt
Copy link
Member Author

dt commented Jul 22, 2019

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Jul 22, 2019
38477: storage: add RevertRange r=nvanbenschoten a=dt

This adds support for passing a lower time-bound to ClearRange.

When set, this causes ClearRange to clear KVs in the specified range only if they have a timestamp greater than or equal to the passed time bound, which essentially turns ClearRange into a revert-to-time operator, since it would clear any keys written since then.

This can then be used to rollback a failed bulk ingestion that occured at a given time (assuming no other writes were happening at that time).

Release note: none.

Co-authored-by: David Taylor <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 22, 2019

Build succeeded

@craig craig bot merged commit d0d9b85 into cockroachdb:master Jul 22, 2019
@dt dt deleted the revert branch July 29, 2019 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants