Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes cockroachdb#104696. Fixes cockroachdb#104697. Fixes cockroachdb#104698. Part of cockroachdb#98703. In 072c16d (added as part of cockroachdb#95637) we re-worked the locking structure around the RaftTransport's per-RPC class level send queues. When new send queues are instantiated or old ones deleted, we now also maintain the kvflowcontrol connection tracker, so such maintenance now needs to happen while holding a kvflowcontrol mutex. When rebasing \cockroachdb#95637 onto master, we accidentally included earlier queue deletion code without holding the appropriate mutex. Queue deletions now happened twice which made it possible to hit a RaftTransport assertion about expecting the right send queue to already exist. Specifically, the following sequence was possible: - (*RaftTransport).SendAsync is invoked, observes no queue for <nodeid,class>, creates it, and tracks it in the queues map. - It invokes an async worker W1 to process that send queue through (*RaftTransport).startProcessNewQueue. The async worker is responsible for clearing the tracked queue in the queues map once done. - W1 expects to find the tracked queue in the queues map, finds it, proceeds. - W1 is done processing. On its way out, W1 clears <nodeid,class> from the queues map the first time. - (*RaftTransport).SendAsync is invoked by another goroutine, observes no queue for <nodeid,class>, creates it, and tracks it in the queues map. - It invokes an async worker W2 to process that send queue through (*RaftTransport).startProcessNewQueue. The async worker is responsible for clearing the tracked queue in the queues map once done. - W1 blindly clears the <nodeid,class> raft send queue the second time. - W2 expects to find the queue in the queues map, but doesn't, and fatals. Release note: None
- Loading branch information