-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storage, server: don't reuse leases obtained before a restart or tran…
…sfer Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. There was also a problem before this patch, related to lease transfers: if a transfer command returned an error to the `replica.Send()` call that proposed it, we'd clear our "in transfer" state and allow the replica to use the existing lease. However, such errors are generally ambiguous - there's no guarantee that the transfer will still apply. In such cases, the replica was actually breaking its promise to not use the lease after initiating the tranfer. The current patch addresses both these problems by introducing a `replica.minLeaseProposedTS` and a `Lease.ProposedTS`. These fields are used to enforce the fact that only leases proposed after a particular time will be used *for proposing* new commands, or for serving reads. On a transfer, the field for the replica in question is set to the present time. On restart, the field is set for all the replicas to the current time, after the Server has waited such that it has a guarantee that the current HLC is above what it might have been on previous incarnations. This ensures that, in both the transfer and the restart case, leases that have been proposed before the transfer/restart and were in flight at the time when the transfer/restart happens are not eligible to be used. This patch also changes the way waiting on startup for a server's HLC to be guaranteed to be monotonic wrt to the HLC before the restart works. Before, we were waiting just before starting serving, but after all the stores had been started. The point of the waiting was to not serve reads thinking we have a lower timestamp than before the restart. I'm unclear about whether that was correct or not, considering that, for example, a store's queues were already running and potentially doing work. Now that we also rely on that waiting for not reusing old leases (we need to initialize replica.minLeaseProposedTS with a value higher than the node had before the restart), I've had to do the waiting *before* initializing the stores. I could have done the setting of that field late, but it seems even more dangerous than before to allow queues to do work with potentially bad leases. We lost the amortization of this wait time with the store creation process... But note that, for empty engines (e.g. in tests) we don't do any waiting. Fixes #7996 THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD" Because of the field being added to the Lease proto, we can't have leases proposed by new servers being applied by old servers - they're going to be serialized differently and the consistency checker will flag it. Note that a "freeze" is not required, though. The new field is nullable, so any lease requests that might be in flight at the time of the world restart will be serialized the same as they have been by the nodes that already applied them before the restart.
- Loading branch information
1 parent
0b4aad8
commit 3d508a1
Showing
16 changed files
with
682 additions
and
175 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
87 changes: 85 additions & 2 deletions
87
pkg/storage/engine/rocksdb/cockroach/pkg/roachpb/data.pb.cc
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.