Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangefeed: enable and improve rangefeed connection class #108992

Closed
erikgrinaker opened this issue Aug 18, 2023 · 16 comments
Closed

rangefeed: enable and improve rangefeed connection class #108992

erikgrinaker opened this issue Aug 18, 2023 · 16 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Aug 18, 2023

We have a separate RPC connection class for rangefeeds, but it's disabled by default via kv.rangefeed.use_dedicated_connection_class.enabled.

Rangefeeds can be high-volume during e.g. initial/catchup-scans, and have a high restart cost if they e.g. hit NetworkTimeout. We should consider enabling this class by default, plumbing it through not only for the DistSender connections but also the DistSQL flow, and use a higher network timeout since they're not as sensitive to availability following outages.

Jira issue: CRDB-30735

Epic CRDB-32846

@erikgrinaker erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Aug 18, 2023
@blathers-crl
Copy link

blathers-crl bot commented Aug 18, 2023

cc @cockroachdb/replication

@erikgrinaker
Copy link
Contributor Author

cc @cockroachdb/cdc

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 5, 2024

My understanding of #74222 is that the RangefeedClass is a stopgap for mitigating OOMs in the older rangefeed architecture where every range had a dedicated gRPC stream.

If I understand correctly, the premise of this issue is to "promote" this class from a stopgap to something that gives more benefits. SGTM.

From #111262 (comment), my take is that we can remove the special casing of the window size for RangefeedClass for >= 24.1. Can be done as part of this issue.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jan 5, 2024

Yeah, I think the specific actions we can take here are:

  • Remove the setting, and always use a separate rangefeed connection class: kvcoord: use rangefeed connection class #117730
  • Remove the window special casing, now that mux rangefeeds are used unconditionally: rpc: rm rangefeed RPC stream window special case #117545
  • Use the rangefeed class for the DistSQL flows too (otherwise, only node-local traffic from the DistSQL processor will use the rangefeed class in the common case, and inter-node DistSQL traffic will use the default class).
    • Not needed for now: DistSQL only transfers control information; bulk traffic goes to sink directly.
  • Consider higher network and heartbeat timeouts -- we don't need to aggressively detect failures and fail over for rangefeeds, since they are less latency sensitive and have a high restart cost (have to restart the entire changefeed).

Wdyt @miretskiy?

Why use a separate rangefeed class? I think we want to split off the main classes of "bulk" traffic (rangefeeds and Raft) from foreground RPC requests, such that e.g. large bulk transfers will have less of an impact on read requests.

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 9, 2024

  • Move some critical rangefeeds to the system class

Discussed with the team, there is no clear need for making some rangefeeds more responsive. There are additional complications like #110883.

craig bot pushed a commit that referenced this issue Jan 10, 2024
117505: roachtest: assign adminui ports dynamically for virtual clusters r=srosenberg,renatolabs a=DarrylWong

This was originally removed in #115599 due to #114097 merging, but adminui was reverted in #117141 and mistakenly did not revert the special case for virtual clusters. We unskip the multitenant/distsql tests as well.

Release note: None
Fixes: #117150
Fixes: #117149
Epic: None

117545: rpc: rm rangefeed RPC stream window special case r=erikgrinaker,miretskiy a=pav-kv

The rangefeed stream window size tuning was introduced to mitigate OOM in rangefeeds caused by the excessive number of streams (one per `Range`). Since we now use mux rangefeeds (which multiplexes all the rangefeed traffic into a single stream), this setting is no longer needed, so this commit removes it.

Part of #108992

Release note (ops change): `COCKROACH_RANGEFEED_RPC_INITIAL_WINDOW_SIZE` env variable has been removed, and rangefeed connection now uses the same window size as other RPC connections.

117554: sqlproxyccl: improve authentication throttle error r=JeffSwenson a=JeffSwenson

The sql proxy will throttle connection attempts if a (client IP, tenant cluster) pair has too many authentication failures. The error is usually caused by a misconfigured password in a connection pool. This change replaces the "connection attempt throttled" error message with "too many failed authentication attempts". There is a hint that includes this message but not all drivers are configured to log hints.

Fixes #117552

Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Jeff <[email protected]>
@pav-kv
Copy link
Collaborator

pav-kv commented Jan 12, 2024

Use the rangefeed class for the DistSQL flows too (otherwise, only node-local traffic from the DistSQL processor will use the rangefeed class in the common case, and inter-node DistSQL traffic will use the default class).

@miretskiy Could you give some code pointers to where these DistSQL connections happen?

@miretskiy
Copy link
Contributor

@pav-kv -- I believe that happens when you make a transport.NextInternalClient (newTransportForRange) in dist sender rangefeed.

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 12, 2024

Yeah, that would be the line that I updated here: https://github.com/cockroachdb/cockroach/pull/117730/files#diff-0774117800fc8884b0ef1dc4344112bea392a465c5ea9b0903ac3369d711bf33R758. Though it seems like Erik means some other RPC connection (in DistSQL rather than DistSender).

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jan 15, 2024

My very superficial understanding is that the outbound connection from the coordinator to the remote changefeed processor nodes used to set up the flow is configured here:

conn, err := req.sqlInstanceDialer.Dial(req.ctx, roachpb.NodeID(req.sqlInstanceID), rpc.DefaultClass)

And the reverse connection that streams events from the remote changefeed processors to the coordinator nodes is set up here:

conn, err = dialer.DialNoBreaker(ctx, roachpb.NodeID(sqlInstanceID), rpc.DefaultClass)

I'm not intimately familiar with this though, so you may want to ask someone in SQL queries like @yuzefovich. We're going to need some plumbing to allow the DistSQL coordinator (i.e. changefeeds) to specify the connection class for the flow, probably by adding a field to SetupFlowRequest or some such.

craig bot pushed a commit that referenced this issue Jan 15, 2024
117730: kvcoord: use rangefeed connection class r=erikgrinaker,miretskiy a=pav-kv

This change switches rangefeed to use `RangefeedClass` for RPC traffic by default. The corresponding cluster setting is removed since we don't expect needing to dynamically change this. We leave an escape option: the new env variable allows using `DefaultClass` instead.

This is analogous to the usage of `RaftClass` for raft traffic.

Part of #108992

Release note (performance improvement): this change separates the rangefeed traffic into its own RPC connection class. This improves isolation and reduces interference with the foreground SQL traffic, which reduces chances of head-of-line blocking caused by unrelated traffic. The new `COCKROACH_RANGEFEED_USE_DEFAULT_CONNECTION_CLASS` env variable can be set to use the `DefaultClass` instead (which was the previous default choice for rangefeeds).

Co-authored-by: Pavel Kalinnikov <[email protected]>
@pav-kv
Copy link
Collaborator

pav-kv commented Jan 15, 2024

I browsed the code for a bit to see if there is an easy way to plumb the connection class, but it's not apparent to me. Help from SQL and changefeed teams would be appreciated. @miretskiy @yuzefovich

My understanding:

  1. startDistChangefeed is roughly where the changefeed flow originates from
  2. it creates a plan and a bunch of other things, and
  3. runs the plan
  4. which sets up some flows
  5. which creates runnerRequests and runs them
  6. which, finally, sets up RPC connections

Currently step (6) uses rpc.DefaultClass for the connection. We would like it to be rpc.RangefeedClass for changefeeds.

This can be achieved by passing some information down to step 6 (unless it already has some information in its various fields that makes this distinction possible). A couple of approaches:

  • [worse?] Step 6 understands that this is a changefeed plan/flow/request/connection/whatever, so uses a custom connection class for it. This seems like a leaky abstraction approach.
  • [better?] Have a more generic way of passing just the connection class from steps 1-3 (changefeed) all the way down through 4-6 (DistSQL).

I haven't looked at the outbox side of things. I'm guessing I would need help there too.

@erikgrinaker
Copy link
Contributor Author

FWIW, I think the main concern is getting the outbox connection class right, since this is where the actual rangefeed data stream is sent. I don't think it's a big deal if the flow setup goes across DefaultClass, since it's a small request and not particularly latency-sensitive -- but if we're going to be plumbing through the connection class anyway, we may as well use it for the setup too, and I think it's more understandable that way.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jan 15, 2024

Btw, there's also another source of bulk rangefeed traffic: changefeed initial scans, which emit existing table data to the changefeed before starting the actual rangefeeds. These should probably go across the rangefeed class too, to fully separate off changefeed traffic. These are just plain Scan RPC requests that are sent here:

if err := txn.Run(ctx, b); err != nil {

Unfortunately, there's currently no way to specify which class to use, the determination is made inside the DistSender based on the keyspan -- so we'll need some plumbing for this too:

class: rpc.ConnectionClassForKey(desc.RSpan().Key, rpc.DefaultClass),

@yuzefovich
Copy link
Member

I agree with Erik - in DistSQL we establish connections at two different steps in the flow lifecycle:

  • during the setup of the distributed plan - the gateway node establishes connections to all other nodes to perform the setup (SetupFlowRequest). This is done only once and almost no information is communicated on this connection.
  • each DistSQL "producer" processor ("outbox") establishes a connection to a DistSQL "consumer" processor ("inbox") on another node - this is done in GetConnForOutbox. This is done by each pair of connected processors in the distributed plan, and this is where all of the intermediate information gets sent in the DistSQL flow.

Teaching only the latter about a different connection class would be sufficient.

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 16, 2024

Discussed with the team offline. Punting the last 2 action items from this list, for reasons:

DistSQL flows

We realized that changefeed sends the bulk traffic directly to the sink (that's what makes it scale). DistSQL flows only transfer control information, like checkpoints. In the common case, the amount of this data is small.

Exception: core changefeed (see comparison with the enterprise changefeed). For core changefeeds, the bulk traffic goes through DistSQL and the gateway node. It potentially could carry large amount of data, but is not recommended for use in such cases (production use, or with large / frequently updated tables) - because it's bottlenecked by the gateway.

Custom RPC settings and timeouts

There is no evidence timeouts happen frequently in changefeed, esp. after we increased the default timeout in #109578.


We still have the initial scans in scope:

craig bot pushed a commit that referenced this issue Jan 22, 2024
117810: kvfeed: send scan requests via RANGEFEED class r=erikgrinaker,miretskiy a=pav-kv

This PR makes changefeed initial scan traffic go through the `RANGEFEED` connection class, by setting the new `rpc.ConnectionClass` parameter in the `BatchRequest` header. Since this extends the API, extra care is taken to treat unknown enum values as `DEFAULT`, e.g. this can be needed in mixed-version clusters.

This PR also introduces a `COCKROACH_RPC_USE_DEFAULT_CONNECTION_CLASS` env variable which overrides all non-`SYSTEM` traffic go through the `DEFAULT` class. This is an escape hatch, in case we need to revert to the previous behaviour.

---

Part of #108992

Release note (performance improvement): this change makes changefeed initial scan traffic, which can be bulky, go through a different RPC/TCP connection than the foreground SQL/KV traffic. This reduces interference between workloads, and reduces chances of head-of-line blocking issues.

Release note (ops change): the new environment variable, `COCKROACH_RPC_USE_DEFAULT_CONNECTION_CLASS`, instructs CRDB to send most network/RPC workloads, except the system traffic, through a single "default" RPC/TCP connection. This is a safety mechanism, it reverts to pre-`v24.1` behaviour in case the environment does not tolerate multiple TCP connections. The default behaviour in `v24.1`+ is having multiple connections dedicated to different types of traffic, such as raft and rangefeeds.

Co-authored-by: Pavel Kalinnikov <[email protected]>
@pav-kv
Copy link
Collaborator

pav-kv commented Jan 22, 2024

@erikgrinaker considering this done for now? We could also tweak gRPC settings for the RANGEFEED class (timeouts etc), but I don't know if it's needed - do we have any data?

@erikgrinaker
Copy link
Contributor Author

Let's punt that for now. We don't have a pressing need for it with mux rangefeeds. We can revisit as needed.

@pav-kv pav-kv closed this as completed Jan 22, 2024
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Closed
Development

No branches or pull requests

4 participants