listener: performance degradation when exact balance used with original dst #15146

caitong93 · 2021-02-23T07:03:37Z

Title: listener:performance degradation when exact balance used with original dst

Description:

We use Envoy as a sidecar, all out bound traffic is first redirected to 127.0.0.1:15001(Envoy) by iptables, then forwarded to different listeners by original dst. When exact balance is enabled(on all listeners), we found the connection balance being worse. Tested with 32 downstream connections, 2 workers, connection distribution between the two handlers is always 1:31(sometimes 2:30) by observing downstream_cx_active.
When a new connection being received, it will be handled by ExactConnectionBalancer first, and increase numConnections() of the selected handler by one. Then connection is forwarded to a new listener in ConnectionHandlerImpl::ActiveTcpSocket::newConnection() , and decrease that gauge immediately.
If connections arrive quickly, there is a high chance that the first handler is always selected since numConnections() of all handlers are zero.

The text was updated successfully, but these errors were encountered:

lambdai · 2021-02-23T07:59:19Z

IMHO the rebalance should be applied to the second listener. In most of the cases, the first listener doesn't own any connection

caitong93 · 2021-02-23T08:13:28Z

IMHO the rebalance should be applied to the second listener. In most of the cases, the first listener doesn't own any connection

I also expect rebalance to be applied to the second listener. But(correct me if i am wrong) it seems rebalance can only happen at the first listener, see comments here. If exact_balance is only enabled for the second listener, I guess it won't work.

lambdai · 2021-02-23T17:00:23Z

@caitong93 It won't work under the current code.

This is a reasonable scenario that to apply to the second listener. @mattklein123 I can change if you agree.

mattklein123 · 2021-02-23T18:19:08Z

Yeah I agree we probably need to specially handle this case where for forwarded connections we do the rebalance at that point and not initially.

lambdai · 2021-03-09T21:48:33Z

Plan to fix this along with #15126

boeboe · 2021-04-16T12:29:37Z

@lambdai

Will the fix mean that users have to configure exact_balance on the first catch_all listener 0.0.0.0:15001, or do users have to configure it on the next listener 0.0.0.0:9080 (in case the upstream service/cluster is at 9080)?

Related to istio/istio#18152, where @hobbytp tried to apply the exact_balance on the second listener.

From a end-user perspective, when somebody is digging into this setting, it is because of performance tuning in high throughput and low latency environments and I would assume that he/she thereby expects to have to only tune this setting once, instead of once for every target cluster, handled by a separate 2nd in line 0.0.0.0:<svc_port> listener? Or do you foresee that users should have the ability to configure this per second_in_line_listener/upstream_service pair?

lambdai · 2021-04-20T21:58:25Z

Will the fix mean that users have to configure exact_balance on the first catch_all listener 0.0.0.0:15001, or do users have to configure it on the next listener 0.0.0.0:9080 (in case the upstream service/cluster is at 9080)?

Sorry for the late reply.
For istio where the 15001 listener usually doesn't hold connection, the 15001 should use no-op balancer (the goal is to reduce latency) and the huge amount of "9080" sub listeners should use exact_balancer (trade cross thread migration for balancing).

Yeah, you can also use exact_balancer for 9080 listener and not to use balancer for 9070.

…15842) If listener1 redirects the connection to listener2, the balancer field in listener2 decides whether to rebalance. Previously we rely on the rebalancing at listener1, however, the rebalance is weak because listener1 is likely to not own any connection and the rebalance is no-op. Risk Level: MID. Rebalance may introduce latency. User needs to clear rebalancer field of listener2 to recover the original behavior. Fix #15146 #16113 Signed-off-by: Yuchen Dai <[email protected]>

…nvoyproxy#15842) If listener1 redirects the connection to listener2, the balancer field in listener2 decides whether to rebalance. Previously we rely on the rebalancing at listener1, however, the rebalance is weak because listener1 is likely to not own any connection and the rebalance is no-op. Risk Level: MID. Rebalance may introduce latency. User needs to clear rebalancer field of listener2 to recover the original behavior. Fix envoyproxy#15146 envoyproxy#16113 Signed-off-by: Yuchen Dai <[email protected]> Signed-off-by: Gokul Nair <[email protected]>

caitong93 added bug triage Issue requires triage labels Feb 23, 2021

mattklein123 added area/listener area/perf help wanted Needs help! and removed bug triage Issue requires triage labels Feb 23, 2021

lambdai self-assigned this Feb 23, 2021

lambdai mentioned this issue Apr 5, 2021

Listener: respect the connection balancer of the redirected listener #15842

Merged

boeboe mentioned this issue Apr 16, 2021

Istio control plane supports the envoy function (tcp listener connection balancing) istio/istio#18152

Closed

htuch closed this as completed in #15842 Apr 29, 2021

jacob-delgado mentioned this issue Jan 14, 2022

Set connection_balance_config to exact_balance... istio/istio#36834

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

listener: performance degradation when exact balance used with original dst #15146

listener: performance degradation when exact balance used with original dst #15146

caitong93 commented Feb 23, 2021 •

edited

Loading

lambdai commented Feb 23, 2021

caitong93 commented Feb 23, 2021 •

edited

Loading

lambdai commented Feb 23, 2021

mattklein123 commented Feb 23, 2021

lambdai commented Mar 9, 2021

boeboe commented Apr 16, 2021 •

edited

Loading

lambdai commented Apr 20, 2021

listener: performance degradation when exact balance used with original dst #15146

listener: performance degradation when exact balance used with original dst #15146

Comments

caitong93 commented Feb 23, 2021 • edited Loading

lambdai commented Feb 23, 2021

caitong93 commented Feb 23, 2021 • edited Loading

lambdai commented Feb 23, 2021

mattklein123 commented Feb 23, 2021

lambdai commented Mar 9, 2021

boeboe commented Apr 16, 2021 • edited Loading

lambdai commented Apr 20, 2021

caitong93 commented Feb 23, 2021 •

edited

Loading

caitong93 commented Feb 23, 2021 •

edited

Loading

boeboe commented Apr 16, 2021 •

edited

Loading