-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: separate Raft RPC connection class #111262
Comments
cc @cockroachdb/replication |
Currently raft messages for a We can introduce a separate Do we want to preserve the system/default distinction at this level though, to separate raft traffic for system ranges vs other ranges? I.e. do we want just a single In the latter case, do we want to replace the flat enum by a bitmask so that more combinatorics is possible? // ConnectionClass is a bitmask:
// - 1 bit for system/default distinction (or more, if we want some QoS notion)
// - 2 bits for ConnectionStream (or more if we want to "reserve" for more types)
// - other bits can be used for some extra sub-ConnectionStream sharding
type ConnectionClass int8
// ConnectionStream is the "logical" class for the connection.
// Can be Rangefeed, Raft, Snapshot, or "all other" RPCs.
type ConnectionStream int8
func (c ConnectionClass) IsSystem() bool {
return int8(c) & 1 == 0
}
func (c ConnectionClass) Stream() ConnectionStream {
return ConnectionStream((int8(c) >> 1) & 3)
}
// Alternatively:
//
// func (c ConnectionClass) QOS() ConnectionQOS {
// return ConnectionQOS(int8(c) & (1<<qosBits - 1))
// }
//
// func (c ConnectionClass) Stream() ConnectionStream {
// return ConnectionStream((int8(c) >> qosBits) & 3)
// }
... // other helpers |
The current |
We'll need to keep using
How? I'm not aware of a QoS mechanism in gRPC, and TCP has limited support for this anyway (for a single connection). See e.g. grpc/grpc-go#1448. |
So we have another option:
Are we happy if all the system traffic uses the same class regardless of the traffic type? I.e. if all the system ranges send all the raft, snapshots, rangefeeds, etc via the same It would be good to understand how exactly the classes differ, and whether it's single or multi-dimensional. My current impression is that it is multi-dimensional. I don't mean a concrete QOS/priority mechanism, we don't use any today. Logically though, the "system" class feels like "high priority", while all other classes feel like "default priority" and differ only in the type of traffic they process (e.g. they have different parameters) and give some isolation/fairness across the classes. For example, imagine we want to separate and prioritize system rangefeeds from non-system rangefeeds. Do we introduce a |
Yes, I think this seems reasonable. Note that Raft heartbeats are always sent across
It's basically "anything that's important and needs low latency should go via If we want to do this properly, we should use QUIC and assign per-stream priorities that translate to actual QoS priorities through the networking stack. That would allow us to specify QoS priorities down to individual gRPC requests, without head-of-line blocking. But that's not practical in the near term, and until then we should just be pragmatic about this -- we won't be able to to design a good QoS scheme without proper protocol support.
We should just use
These settings were a kludge due to the number of gRPC streams we need to set up with the legacy Rangefeed protocol (one per range, which can blow up to >100k streams, each with their own 2 MB buffer which can cause OOMs). In 24.1 we unconditionally use the mux rangefeed protocol, which only uses a single stream for each client/node pair. We can possibly get rid of those settings entirely in 24.1. |
Currently, most RPC traffic is sent over a single TCP connection (the
default
class). Unrelated streams on this connection can be subject to head-of-line blocking under heavy load, often due to RPC processing latency rather than TCP processing.We should consider splitting out a separate connection class for Raft traffic, and potentially one for snapshot traffic as well, to reduce interference with foreground RPC traffic. This might also to a small extent mitigate bandwidth-delay product impact on high-latency links (see #111241) and packet loss (see #110099).
Jira issue: CRDB-31842
Epic CRDB-32846
The text was updated successfully, but these errors were encountered: