kvserver: ignore draining nodes in proposal quota #55806

tbg · 2020-10-21T09:38:51Z

Describe the problem

It doesn't seem like we take the Draining status of a node into account in the quota pool. This means that when the node terminates, from the POV of the quota pool it has just disappeared.

I think we mostly get this right, though perhaps accidentally:

cockroach/pkg/kv/kvserver/replica_proposal_quota.go

Lines 151 to 161 in a8ae1bf

    
           if !r.mu.lastUpdateTimes.isFollowerActiveSince( 
        
           	ctx, rep.ReplicaID, now, r.store.cfg.RangeLeaseActiveDuration(), 
        
           ) { 
        
           	return 
        
           } 
        
           // Only consider followers that that have "healthy" RPC connections. 
        
           if err := r.store.cfg.NodeDialer.ConnHealth(rep.NodeID, r.connectionClass.get()); err != nil { 
        
           	return 
        
           }

Note the ConnHealth check here, which presumably would go red fairly quickly, on the order of an RPC heartbeat interval,

cockroach/pkg/base/config.go

Lines 70 to 72 in 1c596ad

    
           // defaultRPCHeartbeatInterval is the default value of RPCHeartbeatInterval 
        
           // used by the rpc context. 
        
           defaultRPCHeartbeatInterval = 3 * time.Second

while the isFollowerActiveSince check will be a bit slower to fire (maybe a few seconds more? Didn't check). Either way, if in that time period we run out of quota, the range will stall until one of the checks clears.

Even if the current checks might be mostly good enough most of the time, it seems desirable to exclude a node from quota pool considerations the moment it becomes draining, to avoid possibly second-long write stalls.

cc @aayushshah15 and @knz since you're both on related topics.

To Reproduce

I don't have a reproduction. One would involve going full speed on a certain range, and gracefully draining one of its members, while asserting that the write latency remains constant.

Expected behavior
Ignore the node for purposes of the quota pool when it has a Draining liveness record.

Additional data / screenshots

Environment:

Additional context

Jira issue: CRDB-3627

Epic CRDB-39898

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2022-04-04T10:21:32Z

Related to #77251.

github-actions · 2023-11-23T11:05:29Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label Oct 21, 2020

This comment has been minimized.

Sign in to view

tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 21, 2020

jlinder added the T-kv KV Team label Jun 16, 2021

erikgrinaker added T-kv-replication and removed T-kv KV Team labels May 31, 2022

tbg mentioned this issue Jun 3, 2022

kvserver: throttle writes on followers #79215

Closed

irfansharif mentioned this issue Aug 20, 2022

kvserver: provide escape hatch for per-replica proposal quota pool #77251

Closed

erikgrinaker mentioned this issue Oct 25, 2023

kvserver: a slow follower should not stall the quota pool #113054

Open

github-actions bot added the no-issue-activity label Nov 23, 2023

erikgrinaker removed the no-issue-activity label Nov 27, 2023

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Incoming in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: ignore draining nodes in proposal quota #55806

kvserver: ignore draining nodes in proposal quota #55806

tbg commented Oct 21, 2020 •

edited by exalate-issue-sync bot

Loading

This comment has been minimized.

erikgrinaker commented Apr 4, 2022

github-actions bot commented Nov 23, 2023

kvserver: ignore draining nodes in proposal quota #55806

kvserver: ignore draining nodes in proposal quota #55806

Comments

tbg commented Oct 21, 2020 • edited by exalate-issue-sync bot Loading

This comment has been minimized.

erikgrinaker commented Apr 4, 2022

github-actions bot commented Nov 23, 2023

tbg commented Oct 21, 2020 •

edited by exalate-issue-sync bot

Loading