-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: lease transfer must be persisted to disk #7996
Comments
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Sep 24, 2016
... so we don't use those leases after a restart fixes cockroachdb#7996
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Oct 11, 2016
... so we don't use those leases after a restart fixes cockroachdb#7996
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Oct 25, 2016
Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. Another problem is that a lease that was in the process of being transferred away before the restart is used, and that's no good because it breaks the lease holder's promise to not use that lease. This patch adds a bit of in-memory state to a replica - whether the lease has changed since the process has been started. If it hasn't, a lease is not used _at propose time_ and instead the holder pretends it doesn't have a valid lease and tries to acquire one. Nodes other than the holder do not modify their behavior. No similar check is done at apply time (nor could it) since there we naturally rely on Raft to order writes; so, if the proposer is still the lease holder, there's no reason not to apply a command. fixes cockroachdb#7996
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Oct 25, 2016
Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. Another problem is that a lease that was in the process of being transferred away before the restart is used, and that's no good because it breaks the lease holder's promise to not use that lease. This patch adds a bit of in-memory state to a replica - whether the lease has changed since the process has been started. If it hasn't, a lease is not used _at propose time_ and instead the holder pretends it doesn't have a valid lease and tries to acquire one. Nodes other than the holder do not modify their behavior. No similar check is done at apply time (nor could it) since there we naturally rely on Raft to order writes; so, if the proposer is still the lease holder, there's no reason not to apply a command. fixes cockroachdb#7996
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Nov 2, 2016
…sfer Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. There was also a problem before this patch, related to lease transfers: if a transfer command returned an error to the `replica.Send()` call that proposed it, we'd clear our "in transfer" state and allow the replica to use the existing lease. However, such errors are generally ambiguous - there's no guarantee that the transfer will still apply. In such cases, the replica was actually breaking its promise to not use the lease after initiating the tranfer. The current patch addresses both these problems by introducing a `replica.minLeaseProposedTs` and a `Lease.ProposedTs`. These fields are used to enforce the fact that only leases proposed after a particular time will be used *for proposing* new commands, or for serving reads. On a transfer, the field for the replica in question is set to the present time. On restart, the field is set for all the replicas to the current time, after the Server has waited such that it has a guarantee that the current HLC is above what it might have been on previous incarnations. This ensures that, in both the transfer and the restart case, leases that have been proposed before the transfer/restart and were in flight at the time when the transfer/restart happens are not eligible to be used. This patch also changes the way waiting on startup for a server's HLC to be guaranteed to be monotonic wrt to the HLC before the restart works. Before, we were waiting just before starting serving, but after all the stores had been started. The point of the waiting was to not serve reads thinking we have a lower timestamp than before the restart. I'm unclear about whether that was correct or not, considering that, for example, a store's queues were already running and potentially doing work. Now that we also rely on that waiting for not reusing old leases (we need to initialize replica.minLeaseProposedTs with a value higher than the node had before the restart), I've had to do the waiting *before* initializing the stores. I could have done the setting of that field late, but it seems even more dangerous than before to allow queues to do work with potentially bad leases. We lost the amortization of this wait time with the store creation process... But note that, for empty engines (e.g. in tests) we don't do any waiting. Fixes cockroachdb#7996 THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD" Because of the field being added to the Lease proto, we can't have leases proposed by new servers being applied by old servers - they're going to be serialized differently and the consistency checker will flag it. Note that a "freeze" is not required, though. The new field is nullable, so any lease requests that might be in flight at the time of the world restart will be serialized the same as they have been by the nodes that already applied them before the restart.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Nov 3, 2016
…sfer Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. There was also a problem before this patch, related to lease transfers: if a transfer command returned an error to the `replica.Send()` call that proposed it, we'd clear our "in transfer" state and allow the replica to use the existing lease. However, such errors are generally ambiguous - there's no guarantee that the transfer will still apply. In such cases, the replica was actually breaking its promise to not use the lease after initiating the tranfer. The current patch addresses both these problems by introducing a `replica.minLeaseProposedTS` and a `Lease.ProposedTS`. These fields are used to enforce the fact that only leases proposed after a particular time will be used *for proposing* new commands, or for serving reads. On a transfer, the field for the replica in question is set to the present time. On restart, the field is set for all the replicas to the current time, after the Server has waited such that it has a guarantee that the current HLC is above what it might have been on previous incarnations. This ensures that, in both the transfer and the restart case, leases that have been proposed before the transfer/restart and were in flight at the time when the transfer/restart happens are not eligible to be used. This patch also changes the way waiting on startup for a server's HLC to be guaranteed to be monotonic wrt to the HLC before the restart works. Before, we were waiting just before starting serving, but after all the stores had been started. The point of the waiting was to not serve reads thinking we have a lower timestamp than before the restart. I'm unclear about whether that was correct or not, considering that, for example, a store's queues were already running and potentially doing work. Now that we also rely on that waiting for not reusing old leases (we need to initialize replica.minLeaseProposedTS with a value higher than the node had before the restart), I've had to do the waiting *before* initializing the stores. I could have done the setting of that field late, but it seems even more dangerous than before to allow queues to do work with potentially bad leases. We lost the amortization of this wait time with the store creation process... But note that, for empty engines (e.g. in tests) we don't do any waiting. Fixes cockroachdb#7996 THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD" Because of the field being added to the Lease proto, we can't have leases proposed by new servers being applied by old servers - they're going to be serialized differently and the consistency checker will flag it. Note that a "freeze" is not required, though. The new field is nullable, so any lease requests that might be in flight at the time of the world restart will be serialized the same as they have been by the nodes that already applied them before the restart.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Nov 4, 2016
…sfer Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. There was also a problem before this patch, related to lease transfers: if a transfer command returned an error to the `replica.Send()` call that proposed it, we'd clear our "in transfer" state and allow the replica to use the existing lease. However, such errors are generally ambiguous - there's no guarantee that the transfer will still apply. In such cases, the replica was actually breaking its promise to not use the lease after initiating the tranfer. The current patch addresses both these problems by introducing a `replica.minLeaseProposedTS` and a `Lease.ProposedTS`. These fields are used to enforce the fact that only leases proposed after a particular time will be used *for proposing* new commands, or for serving reads. On a transfer, the field for the replica in question is set to the present time. On restart, the field is set for all the replicas to the current time, after the Server has waited such that it has a guarantee that the current HLC is above what it might have been on previous incarnations. This ensures that, in both the transfer and the restart case, leases that have been proposed before the transfer/restart and were in flight at the time when the transfer/restart happens are not eligible to be used. This patch also changes the way waiting on startup for a server's HLC to be guaranteed to be monotonic wrt to the HLC before the restart works. Before, we were waiting just before starting serving, but after all the stores had been started. The point of the waiting was to not serve reads thinking we have a lower timestamp than before the restart. I'm unclear about whether that was correct or not, considering that, for example, a store's queues were already running and potentially doing work. Now that we also rely on that waiting for not reusing old leases (we need to initialize replica.minLeaseProposedTS with a value higher than the node had before the restart), I've had to do the waiting *before* initializing the stores. I could have done the setting of that field late, but it seems even more dangerous than before to allow queues to do work with potentially bad leases. We lost the amortization of this wait time with the store creation process... But note that, for empty engines (e.g. in tests) we don't do any waiting. Fixes cockroachdb#7996 THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD" Because of the field being added to the Lease proto, we can't have leases proposed by new servers being applied by old servers - they're going to be serialized differently and the consistency checker will flag it. Note that a "freeze" is not required, though. The new field is nullable, so any lease requests that might be in flight at the time of the world restart will be serialized the same as they have been by the nodes that already applied them before the restart.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Nov 7, 2016
…sfer Before this patch, leases held before a restart could be used after the restart. This is incorrect, since the command queue has been wiped in the restart and so reads and writes are not properly sequenced with possible in-flight commands. There was also a problem before this patch, related to lease transfers: if a transfer command returned an error to the `replica.Send()` call that proposed it, we'd clear our "in transfer" state and allow the replica to use the existing lease. However, such errors are generally ambiguous - there's no guarantee that the transfer will still apply. In such cases, the replica was actually breaking its promise to not use the lease after initiating the tranfer. The current patch addresses both these problems by introducing a `replica.minLeaseProposedTS` and a `Lease.ProposedTS`. These fields are used to enforce the fact that only leases proposed after a particular time will be used *for proposing* new commands, or for serving reads. On a transfer, the field for the replica in question is set to the present time. On restart, the field is set for all the replicas to the current time, after the Server has waited such that it has a guarantee that the current HLC is above what it might have been on previous incarnations. This ensures that, in both the transfer and the restart case, leases that have been proposed before the transfer/restart and were in flight at the time when the transfer/restart happens are not eligible to be used. This patch also changes the way waiting on startup for a server's HLC to be guaranteed to be monotonic wrt to the HLC before the restart works. Before, we were waiting just before starting serving, but after all the stores had been started. The point of the waiting was to not serve reads thinking we have a lower timestamp than before the restart. I'm unclear about whether that was correct or not, considering that, for example, a store's queues were already running and potentially doing work. Now that we also rely on that waiting for not reusing old leases (we need to initialize replica.minLeaseProposedTS with a value higher than the node had before the restart), I've had to do the waiting *before* initializing the stores. I could have done the setting of that field late, but it seems even more dangerous than before to allow queues to do work with potentially bad leases. We lost the amortization of this wait time with the store creation process... But note that, for empty engines (e.g. in tests) we don't do any waiting. Fixes cockroachdb#7996 THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD" Because of the field being added to the Lease proto, we can't have leases proposed by new servers being applied by old servers - they're going to be serialized differently and the consistency checker will flag it. Note that a "freeze" is not required, though. The new field is nullable, so any lease requests that might be in flight at the time of the world restart will be serialized the same as they have been by the nodes that already applied them before the restart.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Or a node which transfers its lease away might restart rapidly and use a tail end of its lease while its lease was really shortened.
The naive fix (which might well be the right one) is to persist the expiration date of the transferred lease to disk (keyed by RangeID) and, when starting the process, sitting ducks until
the local HLC clock - max_offset
reports a higher timestamp.There is also the somewhat related issue that the hlc clock itself may jump backwards. For that reason alone, we should already sleep
max_offset
when starting the process.The text was updated successfully, but these errors were encountered: