kvserver: avoid hanging proposal after leader goes down #46045

tbg · 2020-03-12T19:31:46Z

Deflakes gossip/chaos/nodes=9, i.e.

There was a bug in range quiescence due to which commands would hang in
raft for minutes before actually getting replicated. This would occur
whenever a range was quiesced but a follower replica which didn't know
the (Raft) leader would receive a request. This request would be
evaluated and put into the Raft proposal buffer, and a ready check would
be enqueued. However, no ready would be produced (since the proposal got
dropped by raft; leader unknown) and so the replica would not unquiesce.

This commit prevents this by always waking up the group if the proposal
buffer was initially nonempty, even if an empty Ready is produced.

It goes further than that by trying to ensure that a leader is always
known while quiesced. Previously, on an incoming request to quiesce, we
did not verify that the raft group had learned the leader's identity.

One shortcoming here is that in the situation in which the proposal
would originally hang "forever", it will now hang for one heartbeat
or election timeout where ideally it would be proposed more reactively. Since
this is so rare I didn't try to address this. Instead, refer to
the ideas in

#37906 (comment)

and

#21849

for future changes that could mitigate this.

Without this PR, the test would fail around 10% of the time. With this
change, it passed 40 iterations in a row without a hitch, via:

./bin/roachtest run -u tobias --count 40 --parallelism 10 --cpu-quota 1280 gossip/chaos/nodes=9

Release justification: bug fix
Release note (bug fix): a rare case in which requests to a quiesced
range could hang in the KV replication layer was fixed. This would
manifest as a message saying "have been waiting ... for proposing" even
though no loss of quorum occurred.

Deflake `gossip/chaos` by adding a missing `waitForFullReplication`. This test loops, killing a node and then verifying that the remaining nodes in the cluster stabilize on the same view of gossip connectivity. Periodically the test was failing because gossip wasn't stabilizing. The root issue was that the SQL query to retrieve the gossip connectivity from one node was hanging. And that query was hanging due to unavailability of a range. Logs show that the leaseholder for that range was on a down node and that the range only seemed to contain a single replica. This could happen near the start of the test if we started killing nodes before full replication was achieved. Fixes cockroachdb#38829 Release note: None

Release justification: testing change Release note: None

Release justification: comment-only change Release note: None

cockroach-teamcity · 2020-03-12T19:31:56Z

This change is

petermattis

Thanks for tracking this down! Might want to get another set of #kv eyes on this in order to spread the knowledge.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)

nvanbenschoten

nice find! How far back do we need to backport this?

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 3 of 3 files at r4.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @tbg)

pkg/kv/kvserver/replica_raft.go, line 415 at r4 (raw file):

	err := r.withRaftGroupLocked(true, func(raftGroup *raft.RawNode) (bool, error) {
		numFlushed := r.mu.proposalBuf.Len()
		if err := r.mu.proposalBuf.FlushLockedWithRaftGroup(raftGroup); err != nil {

I would lean towards returning the number of commands that were flushed from this function (i.e. used in FlushLockedWithRaftGroup). That avoids extra atomic accesses that seem a little racy, though I can't construct a scenario where anything actually goes wrong.

pkg/kv/kvserver/replica_raft.go, line 415 at r4 (raw file):

	err := r.withRaftGroupLocked(true, func(raftGroup *raft.RawNode) (bool, error) {
		numFlushed := r.mu.proposalBuf.Len()
		if err := r.mu.proposalBuf.FlushLockedWithRaftGroup(raftGroup); err != nil {

An alternative approach is to propagate raft.ErrProposalDropped errors up to here and unquiesce in that case as well. I'll defer to your preference.

pkg/kv/kvserver/store_raft.go, line 230 at r4 (raw file):

		// quiescing if there's outstanding work.
		r.mu.Lock()
		status := r.raftStatusRLocked()

Would the BasicStatus() do?

There was a bug in range quiescence due to which commands would hang in raft for minutes before actually getting replicated. This would occur whenever a range was quiesced but a follower replica which didn't know the (Raft) leader would receive a request. This request would be evaluated and put into the Raft proposal buffer, and a ready check would be enqueued. However, no ready would be produced (since the proposal got dropped by raft; leader unknown) and so the replica would not unquiesce. This commit prevents this by always waking up the group if the proposal buffer was initially nonempty, even if an empty Ready is produced. It goes further than that by trying to ensure that a leader is always known while quiesced. Previously, on an incoming request to quiesce, we did not verify that the raft group had learned the leader's identity. One shortcoming here is that in the situation in which the proposal would originally hang "forever", it will now hang for one heartbeat timeout where ideally it would be proposed more reactively. Since this is so rare I didn't try to address this. Instead, refer to the ideas in cockroachdb#37906 (comment) and cockroachdb#21849 for future changes that could mitigate this. Without this PR, the test would fail around 10% of the time. With this change, it passed 40 iterations in a row without a hitch, via: ./bin/roachtest run -u tobias --count 40 --parallelism 10 --cpu-quota 1280 gossip/chaos/nodes=9 Release justification: bug fix Release note (bug fix): a rare case in which requests to a quiesced range could hang in the KV replication layer was fixed. This would manifest as a message saying "have been waiting ... for proposing" even though no loss of quorum occurred.

tbg

bors r=nvanbenschoten,petermattis

How far back do we need to backport this?

I don't see why this wasn't always a problem (though the roachtest didn't always fail...)
I'll definitely backport to 19.2 and there should be a similar simplified patch we can make to 19.1.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nvanbenschoten and @petermattis)

pkg/kv/kvserver/replica_raft.go, line 415 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

An alternative approach is to propagate raft.ErrProposalDropped errors up to here and unquiesce in that case as well. I'll defer to your preference.

I'll leave as is.

tbg · 2020-03-17T10:48:52Z

bors pls

bors r=nvanbenschoten,petermattis

craig · 2020-03-17T11:34:44Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten

Reviewed 4 of 4 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)

petermattis and others added 3 commits March 12, 2020 20:23

roachtest: improve status duration display

499cbb4

Release justification: testing change Release note: None

kvserver: comment on propBuf locking

b539a0e

Release justification: comment-only change Release note: None

tbg requested review from nvanbenschoten and petermattis March 12, 2020 19:31

tbg mentioned this pull request Mar 12, 2020

cmd/roachtest: deflake gossip/chaos roachtest #44926

Closed

petermattis approved these changes Mar 12, 2020

View reviewed changes

nvanbenschoten approved these changes Mar 12, 2020

View reviewed changes

tbg force-pushed the gossip-chaos branch 2 times, most recently from 903f1e5 to 71e5ab5 Compare March 16, 2020 21:01

tbg force-pushed the gossip-chaos branch from 71e5ab5 to 1f95860 Compare March 16, 2020 22:14

tbg requested review from nvanbenschoten and petermattis March 16, 2020 22:16

tbg commented Mar 16, 2020

View reviewed changes

craig bot merged commit 021781a into cockroachdb:master Mar 17, 2020

nvanbenschoten reviewed Mar 17, 2020

View reviewed changes

tbg deleted the gossip-chaos branch April 6, 2020 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: avoid hanging proposal after leader goes down #46045

kvserver: avoid hanging proposal after leader goes down #46045

tbg commented Mar 12, 2020

cockroach-teamcity commented Mar 12, 2020

petermattis left a comment

nvanbenschoten left a comment

tbg left a comment

tbg commented Mar 17, 2020

craig bot commented Mar 17, 2020

nvanbenschoten left a comment

kvserver: avoid hanging proposal after leader goes down #46045

kvserver: avoid hanging proposal after leader goes down #46045

Conversation

tbg commented Mar 12, 2020

cockroach-teamcity commented Mar 12, 2020

petermattis left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg commented Mar 17, 2020

craig bot commented Mar 17, 2020

Build succeeded

nvanbenschoten left a comment

Choose a reason for hiding this comment