Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage, server: don't reuse leases obtained before a restart or transfer #10420

Merged
merged 2 commits into from
Nov 7, 2016

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Nov 2, 2016

Before this patch, leases held before a restart could be used after the
restart. This is incorrect, since the command queue has been wiped in
the restart and so reads and writes are not properly sequenced with
possible in-flight commands.

There was also a problem before this patch, related to lease transfers:
if a transfer command returned an error to the replica.Send() call
that proposed it, we'd clear our "in transfer" state and allow the
replica to use the existing lease. However, such errors are generally
ambiguous - there's no guarantee that the transfer will still apply. In
such cases, the replica was actually breaking its promise to not use the
lease after initiating the transfer.

The current patch addresses both these problems by introducing a
replica.minLeaseProposedTs and a Lease.ProposedTs. These fields are
used to enforce the fact that only leases proposed after a particular
time will be used for proposing new commands, or for serving reads. On
a transfer, the field for the replica in question is set to the present
time. On restart, the field is set for all the replicas to the current
time, after the Server has waited such that it has a guarantee that the
current HLC is above what it might have been on previous incarnations.
This ensures that, in both the transfer and the restart case, leases
that have been proposed before the transfer/restart and were in flight
at the time when the transfer/restart happens are not eligible to be
used.

This patch also changes the way waiting on startup for a server's HLC to
be guaranteed to be monotonic wrt to the HLC before the restart works.
Before, we were waiting just before starting serving, but after all the
stores had been started. The point of the waiting was to not serve reads
thinking we have a lower timestamp than before the restart. I'm unclear
about whether that was correct or not, considering that, for example, a
store's queues were already running and potentially doing work.
Now that we also rely on that waiting for not reusing old leases (we
need to initialize replica.minLeaseProposedTs with a value higher than
the node had before the restart), I've had to do the waiting before
initializing the stores. I could have done the setting of that field
late, but it seems even more dangerous than before to allow queues to do
work with potentially bad leases.
We lost the amortization of this wait time with the store creation
process... But note that, for empty engines (e.g. in tests) we don't do
any waiting.

Fixes #7996

THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD"
Because of the field being added to the Lease proto, we can't have
leases proposed by new servers being applied by old servers - they're
going to be serialized differently and the consistency checker will flag
it.
Note that a "freeze" is not required, though. The new field is nullable,
so any lease requests that might be in flight at the time of the world
restart will be serialized the same as they have been by the nodes that
already applied them before the restart.

cc @petermattis @tschottdorf @bdarnell
@tamird for validating what I've said above about not needing the freeze
@jseldess for keeping in mind that the beta where we roll this out with need a "stop the world"


This change is Reviewable

@andreimatei andreimatei added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Nov 2, 2016
@andreimatei
Copy link
Contributor Author

Review status: 0 of 13 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/replica.go, line 3647 at r2 (raw file):

// particular, we shouldn't be gossiping if we previously initiated a transfer
// and then died.
func (r *Replica) leaseIsValidForGossiping() bool {

I'm not entirely clear on when exactly we want to gossip and when we don't want to... I'm not entirely sure of the consequences of gossiping at the wrong time. Maybe @tschottdorf or someone else can suggest a better comment.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 3, 2016

Needs a rebase and has some failing tests. Should we be reviewing before the tests are passing?

@andreimatei
Copy link
Contributor Author

Yeah, please review. The tests are about the proto that changed, and
something about range stats differing from a golden number, I think.

Btw I'm also writing a test for a server restart, but that needs some new
test infrastructure.

On Nov 2, 2016 10:05 PM, "Tamir Duberstein" [email protected]
wrote:

Needs a rebase and has some failing tests. Should we be reviewing before
the tests are passing?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#10420 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAXBcZSELg5IJ-w1xPJvRJbRfTUxIeTvks5q6UF3gaJpZM4Knytg
.

@bdarnell
Copy link
Contributor

bdarnell commented Nov 3, 2016

:lgtm:


Reviewed 13 of 13 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/roachpb/data.proto, line 335 at r2 (raw file):

  optional ReplicaDescriptor replica = 3 [(gogoproto.nullable) = false];

  // The current timestamp of the proposed at the time when this lease has been

s/of the proposed//


pkg/storage/replica.go, line 618 at r2 (raw file):

  // overridden.
  if !r.store.cfg.TestingKnobs.DontPreventUseOfOldLeaseOnStart {
      r.mu.minLeaseProposedTs = clock.Now()

This will give all the replicas (slightly) different timestamps. This could be a single value chosen by the store, although that's probably not worth the plumbing.


pkg/storage/replica.go, line 3647 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm not entirely clear on when exactly we want to gossip and when we don't want to... I'm not entirely sure of the consequences of gossiping at the wrong time. Maybe @tschottdorf or someone else can suggest a better comment.

Gossip is inherently inconsistent and asynchronous, so while it would be bad for a node to gossip long after its lease has expired, all the subtleties of the HLC are largely unnecessary here. We're just using the lease as a way to ensure that only one node gossips at a time. I think this check is correct - we should stop gossiping after we have transferred the lease away.

We don't really need a gossip-specific definition of lease validity; this could use the haveLease method being introduced in #10305.


pkg/storage/replica_test.go, line 492 at r2 (raw file):

// Test the behavior of a replica while a range lease transfer is in progress:
// - while the transfer is in progress, reads should return errors pointing to
// the transfer targer

s/targer/target/


pkg/storage/store.go, line 144 at r2 (raw file):

          // the restart, to make their life easy. They're not concerned with the
          // reasons why normally we dissalow those leases from being used.
          DontPreventUseOfOldLeaseOnStart: true,

Can we set this in those tests instead of for all tests that use TestStoreConfig?


Comments from Reviewable

@tbg
Copy link
Member

tbg commented Nov 3, 2016

:lgtm_strong: good test (ok, bit hard to follow along, but that's just how those tests are). Does this affect the runtime of the chaos acceptance tests at all? Those are the only tests that would still potentially eat longer unavailability because of kill -9, any others would have nodes (at least in the future) transfer all of their leases away before shutdown.


Reviewed 13 of 13 files at r2.
Review status: all files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.


pkg/roachpb/data.proto, line 335 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/of the proposed//

give this `(gogoproto.customname) = "ProposedTS"` and then `s/Ts/TS/` throughout (including commit message).

pkg/roachpb/data.proto, line 340 at r2 (raw file):

  // This is nullable to help with the rollout (such that a lease applied by
  // some nodes before the rollout and some nodes after the rollout is
  // serialized the same). It should be non-nullable subsequently.

Make this TODO(andrei).

Also, while you're here, you may want to fix the ordering - I think we generally want to order by tag number for simplicity (?). I definitely just wasted time thinking why you'd skipped 4.


pkg/server/server.go, line 561 at r2 (raw file):

  // information. For example, a client might have written a value at a
  // timestamp that's in the future of the restarted node's clock, and if we
  // don't do something, the same client's read would not return it. So, we

it is a little too ambiguous, perhaps s/it/their write/.


pkg/server/server.go, line 562 at r2 (raw file):

  // timestamp that's in the future of the restarted node's clock, and if we
  // don't do something, the same client's read would not return it. So, we
  // wait up to MaxOffset. We assume we couldn't have served timestamps more

We don't assume, we know.


pkg/storage/replica.go, line 297 at r2 (raw file):

      // minLeaseProposedTs is the minimum acceptable proposedTs lease; only
      // leases proposed after this timestamp can be used for proposing commands.
      minLeaseProposedTs hlc.Timestamp

s/Ts/TS/g including the commit message.


pkg/storage/replica.go, line 3647 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Gossip is inherently inconsistent and asynchronous, so while it would be bad for a node to gossip long after its lease has expired, all the subtleties of the HLC are largely unnecessary here. We're just using the lease as a way to ensure that only one node gossips at a time. I think this check is correct - we should stop gossiping after we have transferred the lease away.

We don't really need a gossip-specific definition of lease validity; this could use the haveLease method being introduced in #10305.

Agreed, though I think there's some value in making it more transparent what we can use the lease for (`haveLease` is pretty ambiguous again). Maybe something like `canPropose` is more descriptive. Well, a bike shed for another day. Removing the copy pasta in `maybeGossip.*` is definitely a step up.

pkg/storage/replica_range_lease.go, line 139 at r2 (raw file):

      // transfer will not still apply. That's OK, however, as the "in transfer"
      // state maintained by the pendingLeaseRequest is not relied on for
      // correctness, and reseting the state is beneficial as it'll allow the

s/reset/resett/

Maybe hint at the mechanism that does ensure correctness.


pkg/storage/replica_range_lease.go, line 193 at r2 (raw file):

// correctness: if a previous transfer has returned an error, TransferInProgress
// will return `false`, but that doesn't necessarily mean that the transfer
// cannot still apply.

Maybe hint at the mechanism that does ensure correctness.


pkg/storage/replica_test.go, line 298 at r2 (raw file):

  rngDesc := *tc.rng.Desc()
  rngDesc.Replicas = append(rngDesc.Replicas, secondReplica)
  tc.rng.setDescWithoutProcessUpdate(&rngDesc)

I know this isn't a new hack (and thanks for the de-pasta'ing), but no bueno: this lets in-memory and on-disk state diverge and is something someone might have to clean up and shake their head in disbelief over in the future.
I think you can still do this, but run a put that persists this descriptor as well (then call r.assertState() at the end to see whether it checks out).


pkg/storage/replica_test.go, line 501 at r2 (raw file):

  tc := testContext{}
  tsc := TestStoreConfig()
  var leaseAcquisitionTrap func(ts hlc.Timestamp)

nit: might looked better as an atomic.Value


pkg/storage/replica_test.go, line 508 at r2 (raw file):

          leaseAcquisitionTrap(ts)
      }
      defer leaseAcquisitionTrapMu.Unlock()

could go away with atomic.Value, but if not move it adjacent to the Lock().


pkg/storage/replica_test.go, line 511 at r2 (raw file):

  }
  transferSem := make(chan struct{})
  tsc.TestingKnobs.TestingCommandFilter =

This one is probably fine with proposer-evaluated KV (since you're doing single-node stuff and don't really care where in the proposal you trap the thing).


pkg/storage/replica_test.go, line 519 at r2 (raw file):

              <-transferSem
              // Return an error, so that the pendingLeaseRequest considers the
              // transferred failed.

s/red//


pkg/storage/replica_test.go, line 553 at r2 (raw file):

  repDesc, err := tc.rng.getReplicaDescriptorLocked()
  if err != nil {
      t.Fatal(err)

missing Unlock.


pkg/storage/replica_test.go, line 592 at r2 (raw file):

  // apply.
  // Concretely, we're going to check that a read triggers a new lease
  // acqusition.

acquisition.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 3, 2016

Reviewed 1 of 1 files at r1, 10 of 13 files at r2.
Review status: all files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.


pkg/server/server.go, line 562 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

We don't assume, we know.

No we don't - the previous incarnation could have been running with a higher MaxOffset.

Comments from Reviewable

@andreimatei
Copy link
Contributor Author

still haven't written that restart test... but I will.
And haven't checked the chaos tests, will do.


Review status: all files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.


pkg/roachpb/data.proto, line 335 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

give this (gogoproto.customname) = "ProposedTS" and then s/Ts/TS/ throughout (including commit message).

Done.

pkg/roachpb/data.proto, line 340 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Make this TODO(andrei).

Also, while you're here, you may want to fix the ordering - I think we generally want to order by tag number for simplicity (?). I definitely just wasted time thinking why you'd skipped 4.

Done.

pkg/server/server.go, line 561 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

it is a little too ambiguous, perhaps s/it/their write/.

Done.

pkg/server/server.go, line 562 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

We don't assume, we know.

Done. "JERRY: Well, why would I assume. I never assume. Leads to assumptions."

pkg/storage/replica.go, line 297 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

s/Ts/TS/g including the commit message.

Done. But... why tho? "Timestamp" is one word...

pkg/storage/replica.go, line 618 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This will give all the replicas (slightly) different timestamps. This could be a single value chosen by the store, although that's probably not worth the plumbing.

yeah, I've considered that. In fact that's how I wrote it initially, but, yeah, too much plumbing. Also these replicas already use divergent timestamps for stuff (e.g. the tsCache low-water mark)

pkg/storage/replica.go, line 3647 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Agreed, though I think there's some value in making it more transparent what we can use the lease for (haveLease is pretty ambiguous again). Maybe something like canPropose is more descriptive. Well, a bike shed for another day. Removing the copy pasta in maybeGossip.* is definitely a step up.

I've renamed the function and rewritten the comment; see if you like it. I don't want to introduce more consolidation of checks before #10305...

pkg/storage/replica_range_lease.go, line 139 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

s/reset/resett/

Maybe hint at the mechanism that does ensure correctness.

Done.

pkg/storage/replica_range_lease.go, line 193 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Maybe hint at the mechanism that does ensure correctness.

Done.

pkg/storage/replica_test.go, line 298 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I know this isn't a new hack (and thanks for the de-pasta'ing), but no bueno: this lets in-memory and on-disk state diverge and is something someone might have to clean up and shake their head in disbelief over in the future.
I think you can still do this, but run a put that persists this descriptor as well (then call r.assertState() at the end to see whether it checks out).

Done. Although shaking your head in disbelief is definitely part of the job :)

pkg/storage/replica_test.go, line 492 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/targer/target/

Done.

pkg/storage/replica_test.go, line 501 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

nit: might looked better as an atomic.Value

Done.

pkg/storage/replica_test.go, line 508 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

could go away with atomic.Value, but if not move it adjacent to the Lock().

Done.

pkg/storage/replica_test.go, line 511 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This one is probably fine with proposer-evaluated KV (since you're doing single-node stuff and don't really care where in the proposal you trap the thing).

right... But this comment sounds like it's in contrast with something else not being fine with prop-eval KV... But I don't see that something.

pkg/storage/replica_test.go, line 519 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

s/red//

Done.

pkg/storage/replica_test.go, line 553 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

missing Unlock.

Done.

pkg/storage/replica_test.go, line 592 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

acquisition.

Done.

pkg/storage/store.go, line 144 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can we set this in those tests instead of for all tests that use TestStoreConfig?

Well, the problem is that no test fails without this :) So I was afraid I was changing test behavior and invalidating tests. But I've looked through the tests doing restarts, and only one seemed to want to preserve leases. The others were in `client_raft_test.go` and `client_split_test.go`, and, at least on the very surface, don't seem like they'd care about the state of leases. So I've let them be.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 3, 2016

Reviewed 14 of 14 files at r3, 12 of 15 files at r4.
Review status: 13 of 16 files reviewed at latest revision, 19 unresolved discussions, some commit checks failed.


pkg/server/server.go, line 562 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Done.
"JERRY: Well, why would I assume. I never assume. Leads to assumptions."

again, the previous incarnation could have had a higher maxoffset. you should also link to that issue you filed about this.

pkg/storage/client_raft_test.go, line 309 at r4 (raw file):

  // range can cause spurious lease transfers which cause this test to fail.
  sc.TestingKnobs.DisablePeriodicGossips = true
  sc.TestingKnobs.DontPreventUseOfOldLeaseOnStart = true

comment.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented Nov 3, 2016

Reviewed 1 of 14 files at r3, 15 of 15 files at r4.
Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


pkg/server/server.go, line 562 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

again, the previous incarnation could have had a higher maxoffset. you should also link to that issue you filed about this.

Yeah, a comment about what would need to be done to support changing of `MaxOffset` safely would be good, though the main comment of that type maybe already exists elsewhere. Perhaps just link the issue as @tamird suggested.

pkg/storage/replica.go, line 297 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Done.
But... why tho? "Timestamp" is one word...

Hypertext is also a word and yet it is `HTTP`, not `HtTP`.

pkg/storage/replica_test.go, line 511 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

right... But this comment sounds like it's in contrast with something else not being fine with prop-eval KV... But I don't see that something.

Sorry, just an isolated comment. Basically "I looked at this because I didn't want it to break with proposer-evaluated KV, and I don't think it would".

Comments from Reviewable

@andreimatei
Copy link
Contributor Author

Review status: 0 of 17 files reviewed at latest revision, 9 unresolved discussions.


pkg/server/server.go, line 562 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Yeah, a comment about what would need to be done to support changing of MaxOffset safely would be good, though the main comment of that type maybe already exists elsewhere. Perhaps just link the issue as @tamird suggested.

Done.

pkg/storage/client_raft_test.go, line 309 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

comment.

well the comment from the definition of the field applies sufficiently well

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 4, 2016

Reviewed 1 of 16 files at r5, 12 of 16 files at r6.
Review status: 13 of 17 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/storage/client_raft_test.go, line 309 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well the comment from the definition of the field applies sufficiently well

it doesn't explain what this test does that requires this flag

Comments from Reviewable

petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 7, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1448.8        0     1686   559/554     383/0     331/1     413/2

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1413.7        0     1658   406/132   422/141   410/136   420/138

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 7, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1448.8        0     1686   559/554     383/0     331/1     413/2

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1413.7        0     1658   406/132   422/141   410/136   420/138

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 7, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1448.8        0     1686   559/554     383/0     331/1     413/2

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1413.7        0     1658   406/132   422/141   410/136   420/138

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
…sfer

Before this patch, leases held before a restart could be used after the
restart. This is incorrect, since the command queue has been wiped in
the restart and so reads and writes are not properly sequenced with
possible in-flight commands.

There was also a problem before this patch, related to lease transfers:
if a transfer command returned an error to the `replica.Send()` call
that proposed it, we'd clear our "in transfer" state and allow the
replica to use the existing lease. However, such errors are generally
ambiguous - there's no guarantee that the transfer will still apply. In
such cases, the replica was actually breaking its promise to not use the
lease after initiating the tranfer.

The current patch addresses both these problems by introducing a
`replica.minLeaseProposedTS` and a `Lease.ProposedTS`. These fields are
used to enforce the fact that only leases proposed after a particular
time will be used *for proposing* new commands, or for serving reads. On
a transfer, the field for the replica in question is set to the present
time. On restart, the field is set for all the replicas to the current
time, after the Server has waited such that it has a guarantee that the
current HLC is above what it might have been on previous incarnations.
This ensures that, in both the transfer and the restart case, leases
that have been proposed before the transfer/restart and were in flight
at the time when the transfer/restart happens are not eligible to be
used.

This patch also changes the way waiting on startup for a server's HLC to
be guaranteed to be monotonic wrt to the HLC before the restart works.
Before, we were waiting just before starting serving, but after all the
stores had been started. The point of the waiting was to not serve reads
thinking we have a lower timestamp than before the restart. I'm unclear
about whether that was correct or not, considering that, for example, a
store's queues were already running and potentially doing work.
Now that we also rely on that waiting for not reusing old leases (we
need to initialize replica.minLeaseProposedTS with a value higher than
the node had before the restart), I've had to do the waiting *before*
initializing the stores. I could have done the setting of that field
late, but it seems even more dangerous than before to allow queues to do
work with potentially bad leases.
We lost the amortization of this wait time with the store creation
process... But note that, for empty engines (e.g. in tests) we don't do
any waiting.

Fixes cockroachdb#7996

THIS PATCH NEEDS TO BE DEPLOYED THROUGH A "STOP THE WORLD"
Because of the field being added to the Lease proto, we can't have
leases proposed by new servers being applied by old servers - they're
going to be serialized differently and the consistency checker will flag
it.
Note that a "freeze" is not required, though. The new field is nullable,
so any lease requests that might be in flight at the time of the world
restart will be serialized the same as they have been by the nodes that
already applied them before the restart.
@andreimatei
Copy link
Contributor Author

the only chaos test that's not skipped is TestNodeRestart, which takes anywhere between 15 and 22s, with or without this PR.


Review status: 0 of 17 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending.


pkg/storage/client_raft_test.go, line 309 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

it doesn't explain what this test does that requires this flag

added a comment, but I think it's as redundant as they get.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 7, 2016

Reviewed 1 of 17 files at r7, 13 of 16 files at r8.
Review status: 14 of 17 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/storage/replica_range_lease.go, line 217 at r8 (raw file):

// sent on the returned chan.
func (r *Replica) requestLeaseLocked(timestamp hlc.Timestamp) <-chan *roachpb.Error {
  if r.store.TestingKnobs().LeaseRequestEvent != nil {

slight preference for a binding local to this if


Comments from Reviewable

@andreimatei andreimatei deleted the lease-proposed-time branch November 7, 2016 18:02
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 7, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1448.8        0     1686   559/554     383/0     331/1     413/2

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1413.7        0     1658   406/132   422/141   410/136   420/138

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 8, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 8, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 8, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 8, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 8, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
@jseldess jseldess removed the docs-todo label Nov 9, 2016
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 9, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 14, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 14, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
petermattis added a commit to petermattis/cockroach that referenced this pull request Nov 14, 2016
Before (`allocsim -n 4 -w 4`):

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1472.5        0     2124   704/687     492/0     522/0     406/0

After:

_elapsed__ops/sec___errors_replicas_________1_________2_________3_________4
    5m0s   1506.7        0     2157   520/183   548/180   547/166   542/180

I do see a little bit of thrashiness with the lease-transfer heuristics,
though it settles down relatively quickly so I'm not sure if it is worth
addressing yet.

Depends on cockroachdb#10420
tbg added a commit to tbg/cockroach that referenced this pull request Feb 28, 2018
This actually highlights some open questions that I am posting this PR
for.

Touches cockroachdb#10420.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Mar 1, 2018
While we were already making sure that a lease obtained before a node
restart was not used after, the new requested lease would usually be an
extension of the old. As such, commands proposed under both would be
able to apply under the new one, which could theoretically cause
consistency issues as the previous commands would not be tracked by the
command queue (though it would be hard to engineer and has not been
observed in practice, to the best of our knowledge).

This change plugs that hole by preventing an extension of a previously
held lease post restart.

Touches cockroachdb#10420.

Release note (bug fix): Prevent potential consistency issues when a node
is stopped and restarted in rapid succession.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 2, 2018
While we were already making sure that a lease obtained before a node
restart was not used after, the new requested lease would usually be an
extension of the old. As such, commands proposed under both would be
able to apply under the new one, which could theoretically cause
consistency issues as the previous commands would not be tracked by the
command queue (though it would be hard to engineer and has not been
observed in practice, to the best of our knowledge).

This change plugs that hole by preventing an extension of a previously
held lease post restart.

Touches cockroachdb#10420.

Release note (bug fix): Prevent potential consistency issues when a node
is stopped and restarted in rapid succession.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 2, 2018
While we were already making sure that a lease obtained before a node
restart was not used after, the new requested lease would usually be an
extension of the old. As such, commands proposed under both would be
able to apply under the new one, which could theoretically cause
consistency issues as the previous commands would not be tracked by the
command queue (though it would be hard to engineer and has not been
observed in practice, to the best of our knowledge).

This change plugs that hole by preventing an extension of a previously
held lease post restart.

Touches cockroachdb#10420.

Release note (bug fix): Prevent potential consistency issues when a node
is stopped and restarted in rapid succession.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 2, 2018
While we were already making sure that a lease obtained before a node
restart was not used after, the new requested lease would usually be an
extension of the old. As such, commands proposed under both would be
able to apply under the new one, which could theoretically cause
consistency issues as the previous commands would not be tracked by the
command queue (though it would be hard to engineer and has not been
observed in practice, to the best of our knowledge).

This change plugs that hole by preventing an extension of a previously
held lease post restart.

Touches cockroachdb#10420.

Release note (bug fix): Prevent potential consistency issues when a node
is stopped and restarted in rapid succession.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 5, 2018
While we were already making sure that a lease obtained before a node
restart was not used after, the new requested lease would usually be an
extension of the old. As such, commands proposed under both would be
able to apply under the new one, which could theoretically cause
consistency issues as the previous commands would not be tracked by the
command queue (though it would be hard to engineer and has not been
observed in practice, to the best of our knowledge).

This change plugs that hole by preventing an extension of a previously
held lease post restart.

Touches cockroachdb#10420.

Release note (bug fix): Prevent potential consistency issues when a node
is stopped and restarted in rapid succession.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 6, 2018
While we were already making sure that a lease obtained before a node
restart was not used after, the new requested lease would usually be an
extension of the old. As such, commands proposed under both would be
able to apply under the new one, which could theoretically cause
consistency issues as the previous commands would not be tracked by the
command queue (though it would be hard to engineer and has not been
observed in practice, to the best of our knowledge).

This change plugs that hole by preventing an extension of a previously
held lease post restart.

Touches cockroachdb#10420.

Release note (bug fix): Prevent potential consistency issues when a node
is stopped and restarted in rapid succession.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants