Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Rate Limit Listener Opt Out #18678

Closed
wez470 opened this issue Oct 19, 2021 · 4 comments · Fixed by #18876
Closed

Global Rate Limit Listener Opt Out #18678

wez470 opened this issue Oct 19, 2021 · 4 comments · Fixed by #18876

Comments

@wez470
Copy link
Contributor

wez470 commented Oct 19, 2021

Title: Ability to opt out a listener from the global_downstream_max_connections limit

Description:
The use case for this comes up when running Envoy on behalf of others. It is desirable to be able to probe Envoy to see if it's still running properly and take action if it's not. This is not possible with current configuration. Although you can set per listener limits, we want to guarantee a global limit across all listeners created for the user while opting out listeners created for the party running Envoy on their behalf. For the admin interface, this would also allow stats to still be collected while the load balancer is at its limit.

Another alternative could potentially be just opting out the admin interface from this limit if a per listener setting is not desired.

See https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/overload_manager/overload_manager for docs on configuring global_downstream_max_connections

@wez470 wez470 added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Oct 19, 2021
@htuch
Copy link
Member

htuch commented Oct 19, 2021

@KBaichoo any thoughts on what makes sense here? I think we generally want admin to be consistent with other listeners as we want to move admin to a listener one day, as per #2763.

@KBaichoo
Copy link
Contributor

So IIUC, the issue is that the admin might not be accessible due to overload of global_downstream_max_connections. The point of the mechanism is to put a hard limit to avoid the job running out of fds by bounding the number of downstream connections (it seems to assume downstream connections will account for ~ half of fds used, though that might not be the case e.g. if downstream is H2 and upstream H1).

Since there’s plans to move the admin interface to a listener, I think having the ability to have particular listeners opted out of the global rate limit makes sense. There might be a desire to locally limit opt-ed out listeners to avoid running out of fds.

@wez470
Copy link
Contributor Author

wez470 commented Oct 20, 2021

Just to add some more thoughts from my perspective:

So IIUC, the issue is that the admin might not be accessible due to overload of global_downstream_max_connections.

Yes and ideally other listeners created by and for a party running envoy in the case of running on behalf of others. e.g. if I'm running Envoy for someone, I may set 80, 443 listeners for them, but maybe also a direct 200 response listener on port 10000 that I can probe to see Envoy is still working properly. Ideally that port 10000 listener could opt out of the global max cons.

The point of the mechanism is to put a hard limit to avoid the job running out of fds by bounding the number of downstream connections

Yes but also for commodity hardware, Envoy can start maxing out resources (e.g. CPU) w/o running out of FDs. In that case the limit can be used to keep users from overwhelming the hardware/envoy in that regard.

Since there’s plans to move the admin interface to a listener, I think having the ability to have particular listeners opted out of the global rate limit makes sense. There might be a desire to locally limit opt-ed out listeners to avoid running out of fds.

SGTM (although we personally may not need the local limit)

@wez470
Copy link
Contributor Author

wez470 commented Oct 26, 2021

I'd like to implement this. Should be easy enough to add an opt out field to the listener proto to start and then it seems we'd need to update the ListenerConfig https://github.com/envoyproxy/envoy/blob/main/envoy/network/listener.h#L113 here so the lower level classes can access that info. Maybe we can add a bool globalRateLimitOptOut func (or some other name) there. Then add the bool here https://github.com/envoyproxy/envoy/blob/main/envoy/event/dispatcher.h#L233 and plumb it down so the TcpListenerImpl can use it to skip the check. Seem okay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants