rangefeed: enable and improve rangefeed connection class #108992

erikgrinaker · 2023-08-18T13:57:02Z

We have a separate RPC connection class for rangefeeds, but it's disabled by default via kv.rangefeed.use_dedicated_connection_class.enabled.

Rangefeeds can be high-volume during e.g. initial/catchup-scans, and have a high restart cost if they e.g. hit NetworkTimeout. We should consider enabling this class by default, plumbing it through not only for the DistSender connections but also the DistSQL flow, and use a higher network timeout since they're not as sensitive to availability following outages.

Jira issue: CRDB-30735

Epic CRDB-32846

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-08-18T13:57:05Z

cc @cockroachdb/replication

erikgrinaker · 2023-08-18T13:57:10Z

cc @cockroachdb/cdc

pav-kv · 2024-01-05T14:16:07Z

My understanding of #74222 is that the RangefeedClass is a stopgap for mitigating OOMs in the older rangefeed architecture where every range had a dedicated gRPC stream.

If I understand correctly, the premise of this issue is to "promote" this class from a stopgap to something that gives more benefits. SGTM.

From #111262 (comment), my take is that we can remove the special casing of the window size for RangefeedClass for >= 24.1. Can be done as part of this issue.

erikgrinaker · 2024-01-05T14:33:39Z

Yeah, I think the specific actions we can take here are:

Remove the setting, and always use a separate rangefeed connection class: kvcoord: use rangefeed connection class #117730
Remove the window special casing, now that mux rangefeeds are used unconditionally: rpc: rm rangefeed RPC stream window special case #117545
Use the rangefeed class for the DistSQL flows too (otherwise, only node-local traffic from the DistSQL processor will use the rangefeed class in the common case, and inter-node DistSQL traffic will use the default class).
- Not needed for now: DistSQL only transfers control information; bulk traffic goes to sink directly.
Consider higher network and heartbeat timeouts -- we don't need to aggressively detect failures and fail over for rangefeeds, since they are less latency sensitive and have a high restart cost (have to restart the entire changefeed).
- Not needed for now: there is no evidence it's a pressing issue.

Wdyt @miretskiy?

Why use a separate rangefeed class? I think we want to split off the main classes of "bulk" traffic (rangefeeds and Raft) from foreground RPC requests, such that e.g. large bulk transfers will have less of an impact on read requests.

pav-kv · 2024-01-09T14:31:35Z

~~Move some critical rangefeeds to the system class~~

Discussed with the team, there is no clear need for making some rangefeeds more responsive. There are additional complications like #110883.

117505: roachtest: assign adminui ports dynamically for virtual clusters r=srosenberg,renatolabs a=DarrylWong This was originally removed in #115599 due to #114097 merging, but adminui was reverted in #117141 and mistakenly did not revert the special case for virtual clusters. We unskip the multitenant/distsql tests as well. Release note: None Fixes: #117150 Fixes: #117149 Epic: None 117545: rpc: rm rangefeed RPC stream window special case r=erikgrinaker,miretskiy a=pav-kv The rangefeed stream window size tuning was introduced to mitigate OOM in rangefeeds caused by the excessive number of streams (one per `Range`). Since we now use mux rangefeeds (which multiplexes all the rangefeed traffic into a single stream), this setting is no longer needed, so this commit removes it. Part of #108992 Release note (ops change): `COCKROACH_RANGEFEED_RPC_INITIAL_WINDOW_SIZE` env variable has been removed, and rangefeed connection now uses the same window size as other RPC connections. 117554: sqlproxyccl: improve authentication throttle error r=JeffSwenson a=JeffSwenson The sql proxy will throttle connection attempts if a (client IP, tenant cluster) pair has too many authentication failures. The error is usually caused by a misconfigured password in a connection pool. This change replaces the "connection attempt throttled" error message with "too many failed authentication attempts". There is a hint that includes this message but not all drivers are configured to log hints. Fixes #117552 Co-authored-by: DarrylWong <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Jeff <[email protected]>

pav-kv · 2024-01-12T15:44:26Z

Use the rangefeed class for the DistSQL flows too (otherwise, only node-local traffic from the DistSQL processor will use the rangefeed class in the common case, and inter-node DistSQL traffic will use the default class).

@miretskiy Could you give some code pointers to where these DistSQL connections happen?

miretskiy · 2024-01-12T16:43:50Z

@pav-kv -- I believe that happens when you make a transport.NextInternalClient (newTransportForRange) in dist sender rangefeed.

pav-kv · 2024-01-12T16:49:19Z

Yeah, that would be the line that I updated here: https://github.com/cockroachdb/cockroach/pull/117730/files#diff-0774117800fc8884b0ef1dc4344112bea392a465c5ea9b0903ac3369d711bf33R758. Though it seems like Erik means some other RPC connection (in DistSQL rather than DistSender).

erikgrinaker · 2024-01-15T10:55:41Z

My very superficial understanding is that the outbound connection from the coordinator to the remote changefeed processor nodes used to set up the flow is configured here:

cockroach/pkg/sql/distsql_running.go

Line 120 in 0b370fe

    
           conn, err := req.sqlInstanceDialer.Dial(req.ctx, roachpb.NodeID(req.sqlInstanceID), rpc.DefaultClass)

And the reverse connection that streams events from the remote changefeed processors to the coordinator nodes is set up here:

cockroach/pkg/sql/execinfra/outboxbase.go

Line 47 in 711e228

    
           conn, err = dialer.DialNoBreaker(ctx, roachpb.NodeID(sqlInstanceID), rpc.DefaultClass)

I'm not intimately familiar with this though, so you may want to ask someone in SQL queries like @yuzefovich. We're going to need some plumbing to allow the DistSQL coordinator (i.e. changefeeds) to specify the connection class for the flow, probably by adding a field to SetupFlowRequest or some such.

117730: kvcoord: use rangefeed connection class r=erikgrinaker,miretskiy a=pav-kv This change switches rangefeed to use `RangefeedClass` for RPC traffic by default. The corresponding cluster setting is removed since we don't expect needing to dynamically change this. We leave an escape option: the new env variable allows using `DefaultClass` instead. This is analogous to the usage of `RaftClass` for raft traffic. Part of #108992 Release note (performance improvement): this change separates the rangefeed traffic into its own RPC connection class. This improves isolation and reduces interference with the foreground SQL traffic, which reduces chances of head-of-line blocking caused by unrelated traffic. The new `COCKROACH_RANGEFEED_USE_DEFAULT_CONNECTION_CLASS` env variable can be set to use the `DefaultClass` instead (which was the previous default choice for rangefeeds). Co-authored-by: Pavel Kalinnikov <[email protected]>

pav-kv · 2024-01-15T15:10:59Z

I browsed the code for a bit to see if there is an easy way to plumb the connection class, but it's not apparent to me. Help from SQL and changefeed teams would be appreciated. @miretskiy @yuzefovich

My understanding:

startDistChangefeed is roughly where the changefeed flow originates from
it creates a plan and a bunch of other things, and
runs the plan
which sets up some flows
which creates runnerRequests and runs them
which, finally, sets up RPC connections

Currently step (6) uses rpc.DefaultClass for the connection. We would like it to be rpc.RangefeedClass for changefeeds.

This can be achieved by passing some information down to step 6 (unless it already has some information in its various fields that makes this distinction possible). A couple of approaches:

[worse?] Step 6 understands that this is a changefeed plan/flow/request/connection/whatever, so uses a custom connection class for it. This seems like a leaky abstraction approach.
[better?] Have a more generic way of passing just the connection class from steps 1-3 (changefeed) all the way down through 4-6 (DistSQL).

I haven't looked at the outbox side of things. I'm guessing I would need help there too.

erikgrinaker · 2024-01-15T15:20:25Z

FWIW, I think the main concern is getting the outbox connection class right, since this is where the actual rangefeed data stream is sent. I don't think it's a big deal if the flow setup goes across DefaultClass, since it's a small request and not particularly latency-sensitive -- but if we're going to be plumbing through the connection class anyway, we may as well use it for the setup too, and I think it's more understandable that way.

erikgrinaker · 2024-01-15T15:26:57Z

Btw, there's also another source of bulk rangefeed traffic: changefeed initial scans, which emit existing table data to the changefeed before starting the actual rangefeeds. These should probably go across the rangefeed class too, to fully separate off changefeed traffic. These are just plain Scan RPC requests that are sent here:

cockroach/pkg/ccl/changefeedccl/kvfeed/scanner.go

Line 217 in c926799

if err := txn.Run(ctx, b); err != nil {

Unfortunately, there's currently no way to specify which class to use, the determination is made inside the DistSender based on the keyspan -- so we'll need some plumbing for this too:

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Line 2291 in b2b8d78

    
           class:                  rpc.ConnectionClassForKey(desc.RSpan().Key, rpc.DefaultClass),

yuzefovich · 2024-01-15T21:44:36Z

I agree with Erik - in DistSQL we establish connections at two different steps in the flow lifecycle:

during the setup of the distributed plan - the gateway node establishes connections to all other nodes to perform the setup (SetupFlowRequest). This is done only once and almost no information is communicated on this connection.
each DistSQL "producer" processor ("outbox") establishes a connection to a DistSQL "consumer" processor ("inbox") on another node - this is done in GetConnForOutbox. This is done by each pair of connected processors in the distributed plan, and this is where all of the intermediate information gets sent in the DistSQL flow.

Teaching only the latter about a different connection class would be sufficient.

pav-kv · 2024-01-16T14:31:07Z

Discussed with the team offline. Punting the last 2 action items from this list, for reasons:

DistSQL flows

We realized that changefeed sends the bulk traffic directly to the sink (that's what makes it scale). DistSQL flows only transfer control information, like checkpoints. In the common case, the amount of this data is small.

Exception: core changefeed (see comparison with the enterprise changefeed). For core changefeeds, the bulk traffic goes through DistSQL and the gateway node. It potentially could carry large amount of data, but is not recommended for use in such cases (production use, or with large / frequently updated tables) - because it's bottlenecked by the gateway.

Custom RPC settings and timeouts

There is no evidence timeouts happen frequently in changefeed, esp. after we increased the default timeout in #109578.

We still have the initial scans in scope:

move the initial scan KV requests to RangefeedClass: kvfeed: send scan requests via RANGEFEED class #117810

117810: kvfeed: send scan requests via RANGEFEED class r=erikgrinaker,miretskiy a=pav-kv This PR makes changefeed initial scan traffic go through the `RANGEFEED` connection class, by setting the new `rpc.ConnectionClass` parameter in the `BatchRequest` header. Since this extends the API, extra care is taken to treat unknown enum values as `DEFAULT`, e.g. this can be needed in mixed-version clusters. This PR also introduces a `COCKROACH_RPC_USE_DEFAULT_CONNECTION_CLASS` env variable which overrides all non-`SYSTEM` traffic go through the `DEFAULT` class. This is an escape hatch, in case we need to revert to the previous behaviour. --- Part of #108992 Release note (performance improvement): this change makes changefeed initial scan traffic, which can be bulky, go through a different RPC/TCP connection than the foreground SQL/KV traffic. This reduces interference between workloads, and reduces chances of head-of-line blocking issues. Release note (ops change): the new environment variable, `COCKROACH_RPC_USE_DEFAULT_CONNECTION_CLASS`, instructs CRDB to send most network/RPC workloads, except the system traffic, through a single "default" RPC/TCP connection. This is a safety mechanism, it reverts to pre-`v24.1` behaviour in case the environment does not tolerate multiple TCP connections. The default behaviour in `v24.1`+ is having multiple connections dedicated to different types of traffic, such as raft and rangefeeds. Co-authored-by: Pavel Kalinnikov <[email protected]>

pav-kv · 2024-01-22T12:55:27Z

@erikgrinaker considering this done for now? We could also tweak gRPC settings for the RANGEFEED class (timeouts etc), but I don't know if it's needed - do we have any data?

erikgrinaker · 2024-01-22T13:44:00Z

Let's punt that for now. We don't have a pressing need for it with mux rangefeeds. We can revisit as needed.

erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Aug 18, 2023

erikgrinaker mentioned this issue Sep 26, 2023

kv: partition raft transport over high RTT links across multiple TCP connections #111238

Closed

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jan 2, 2024

exalate-issue-sync bot assigned pav-kv Jan 2, 2024

pav-kv mentioned this issue Jan 9, 2024

rpc: rm rangefeed RPC stream window special case #117545

Merged

pav-kv mentioned this issue Jan 12, 2024

kvcoord: use rangefeed connection class #117730

Merged

pav-kv mentioned this issue Jan 16, 2024

kvfeed: send scan requests via RANGEFEED class #117810

Merged

pav-kv closed this as completed Jan 22, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rangefeed: enable and improve rangefeed connection class #108992

rangefeed: enable and improve rangefeed connection class #108992

erikgrinaker commented Aug 18, 2023 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Aug 18, 2023

erikgrinaker commented Aug 18, 2023

pav-kv commented Jan 5, 2024 •

edited

Loading

erikgrinaker commented Jan 5, 2024 •

edited by pav-kv

Loading

pav-kv commented Jan 9, 2024 •

edited

Loading

pav-kv commented Jan 12, 2024

miretskiy commented Jan 12, 2024

pav-kv commented Jan 12, 2024

erikgrinaker commented Jan 15, 2024 •

edited

Loading

pav-kv commented Jan 15, 2024 •

edited

Loading

erikgrinaker commented Jan 15, 2024

erikgrinaker commented Jan 15, 2024 •

edited

Loading

yuzefovich commented Jan 15, 2024

pav-kv commented Jan 16, 2024 •

edited

Loading

pav-kv commented Jan 22, 2024

erikgrinaker commented Jan 22, 2024

rangefeed: enable and improve rangefeed connection class #108992

rangefeed: enable and improve rangefeed connection class #108992

Comments

erikgrinaker commented Aug 18, 2023 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Aug 18, 2023

erikgrinaker commented Aug 18, 2023

pav-kv commented Jan 5, 2024 • edited Loading

erikgrinaker commented Jan 5, 2024 • edited by pav-kv Loading

pav-kv commented Jan 9, 2024 • edited Loading

pav-kv commented Jan 12, 2024

miretskiy commented Jan 12, 2024

pav-kv commented Jan 12, 2024

erikgrinaker commented Jan 15, 2024 • edited Loading

pav-kv commented Jan 15, 2024 • edited Loading

erikgrinaker commented Jan 15, 2024

erikgrinaker commented Jan 15, 2024 • edited Loading

yuzefovich commented Jan 15, 2024

pav-kv commented Jan 16, 2024 • edited Loading

DistSQL flows

Custom RPC settings and timeouts

pav-kv commented Jan 22, 2024

erikgrinaker commented Jan 22, 2024

erikgrinaker commented Aug 18, 2023 •

edited by exalate-issue-sync bot

Loading

pav-kv commented Jan 5, 2024 •

edited

Loading

erikgrinaker commented Jan 5, 2024 •

edited by pav-kv

Loading

pav-kv commented Jan 9, 2024 •

edited

Loading

erikgrinaker commented Jan 15, 2024 •

edited

Loading

pav-kv commented Jan 15, 2024 •

edited

Loading

erikgrinaker commented Jan 15, 2024 •

edited

Loading

pav-kv commented Jan 16, 2024 •

edited

Loading