Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: Raft CheckQuorum causes failover/partial/lease-liveness unavailability #107060

Closed
erikgrinaker opened this issue Jul 18, 2023 · 8 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jul 18, 2023

Coincides with #104042. Given the gradual throughput increase, I'm guessing we're waiting for sequential Raft election timeouts or something.

https://roachperf.crdb.dev/?filter=&view=failover%2Fpartial%2Flease-liveness&tab=gce

Screenshot 2023-07-18 at 10 44 40

Jira issue: CRDB-29865

Epic CRDB-25199

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv-replication labels Jul 18, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 18, 2023

cc @cockroachdb/replication

@erikgrinaker erikgrinaker self-assigned this Jul 18, 2023
@erikgrinaker
Copy link
Contributor Author

Confirmed that this is PreVote+CheckQuorum by running with COCKROACH_RAFT_ENABLE_CHECKQUORUM=false.

@erikgrinaker
Copy link
Contributor Author

This scenario is covered by the integration test TestRequestsOnFollowerWithNonLiveLeaseholder though, which passes.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jul 22, 2023

Seems like ForgetLeader isn't firing here for some reason:

I230722 09:54:45.068122 177 go.etcd.io/raft/v3/raft.go:956 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85342  4 is starting a new election at term 7
I230722 09:54:45.068136 177 go.etcd.io/raft/v3/raft.go:899 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85343  4 became pre-candidate at term 7
I230722 09:54:45.068151 177 go.etcd.io/raft/v3/raft.go:1030 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85344  4 [logterm: 7, index: 855] sent ‹MsgPreVote› request to 2 at term 7
I230722 09:54:45.068170 177 go.etcd.io/raft/v3/raft.go:1030 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85345  4 [logterm: 7, index: 855] sent ‹MsgPreVote› request to 5 at term 7
I230722 09:54:45.068196 177 kv/kvserver/replica_raft.go:833 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›,raft] 85346  raft leader changed: 2 -> 0
I230722 09:54:45.068259 183 go.etcd.io/raft/v3/raft.go:1043 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85347  4 received ‹MsgPreVoteResp› from 4 at term 7
I230722 09:54:45.068273 183 go.etcd.io/raft/v3/raft.go:1650 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85348  4 has received 1 ‹MsgPreVoteResp› votes and 0 vote rejections
I230722 09:54:45.729359 180 go.etcd.io/raft/v3/raft.go:870 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85403  4 became follower at term 7
I230722 09:54:45.729411 180 kv/kvserver/replica_raft.go:833 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›,raft] 85404  raft leader changed: 0 -> 2
I230722 09:54:48.022513 173 go.etcd.io/raft/v3/raft.go:1063 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85621  4 [logterm: 7, index: 855, vote: 0] ignored ‹MsgPreVote› from 5 [logterm: 7, index: 855] at term 7: lease is not expired (remaining ticks: 3)
I230722 09:54:48.731873 184 go.etcd.io/raft/v3/raft.go:1063 ⋮ [T1,n6,s6,r136/4:‹/Table/106/1/6{30111…-48375…}›] 85679  4 [logterm: 7, index: 855, vote: 0] ignored ‹MsgPreVote› from 5 [logterm: 7, index: 855] at term 7: lease is not expired (remaining ticks: 4)

@erikgrinaker
Copy link
Contributor Author

Liveness on n6 claims that n5 is still live, but it isn't.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jul 22, 2023

Yeah, the local liveness record expired at 10:25:33, but it's still marked as IsLive at 10:25:51:

I230722 10:25:51.898526 189 kv/kvserver/replica_raft.go:2180 ⋮ [T1,n6,s6,r92/3:‹/Table/106/1/-3{37885…-19621…}›] 97030
    maybe forget leader: node 5 liveness is true:
    livenesspb.IsLiveMapEntry{Liveness:livenesspb.Liveness{NodeID:5, Epoch:19, Expiration:hlc.LegacyTimestamp{WallTime:1690021533700321096, Logical:0, Synthetic:(*bool)(nil)}, Draining:false, Membership:‹0›}, IsLive:true}

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jul 22, 2023

Right, it's because the node has an RPC connection to the partitioned leader, which overrides IsLive. This liveness stuff is pretty tangled, cc @andrewbaptist.

// Liveness claims that this node is down, but ConnHealth gets the last say
// because we'd rather quiesce a range too little than one too often. Note
// that this policy is different from the one governing the releasing of
// proposal quota; see comments over there.
//
// NB: This has false negatives when we haven't attempted to connect to the
// node yet, where it will return rpc.ErrNotHeartbeated regardless of
// whether the node is up or not. Once connected, the RPC circuit breakers
// will continually probe the connection. The check can also have false
// positives if the node goes down after populating the map, but that
// matters even less.
entry.IsLive = !s.TestingKnobs().DisableLivenessMapConnHealth &&
(s.cfg.NodeDialer.ConnHealth(nodeID, rpc.SystemClass) == nil)

shouldCampaignOnLeaseRequestRedirect avoids this by explicitly computing IsLive instead of looking at the field.

// NOTE: we intentionally do not look at the IsLiveMapEntry.IsLive field,
// which accounts for whether the leader is reachable from this node (see
// Store.updateLivenessMap). We only care whether the leader is currently live
// according to node liveness because this determines whether it will be able
// to acquire an epoch-based lease.
return !livenessEntry.Liveness.IsLive(now)

shouldForgetLeaderOnVoteRequest needs to do the same.

@erikgrinaker
Copy link
Contributor Author

Confirmed that explicitly checking IsLive(now) resolves failover/partial/lease-liveness, will submit a PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant