Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP routers should use bounded buffering. #287

Closed
olix0r opened this issue Feb 7, 2018 · 15 comments · Fixed by #398
Closed

HTTP routers should use bounded buffering. #287

olix0r opened this issue Feb 7, 2018 · 15 comments · Fixed by #398
Assignees
Milestone

Comments

@olix0r
Copy link
Member

olix0r commented Feb 7, 2018

Currently, the Inbound and Outbound HTTP routers use unbounded buffers:

impl<B> Recognize for Inbound<B> {
    // ...
    type Service = Buffer<bind::Service<B>>;
impl<B> Recognize for Outbound<B> {
    // ...
    type Service = Buffer<Balance<
        load::WithPendingRequests<Discovery<B>>,
        choose::PowerOfTwoChoices<rand::ThreadRng>,
    >>;

Each produces services wrapped in tower_buffer::Buffer

//! Buffer requests when the inner service is out of capacity.
//!
//! Buffering works by spawning a new task that is dedicated to pulling requests
//! out of the buffer and dispatching them to the inner service. By adding a
//! buffer and a dedicated task, the `Buffer` layer in front of the service can
//! be `Clone` even if the inner service is not.
//!
//! Currently, `Buffer` uses an unbounded buffering strategy, which is not a
//! good thing to put in production situations. However, it illustrates the idea
//! and capabilities around adding buffering to an arbitrary `Service`.

At a glance, it's not immediately obvious to me why Buffer needs to use mpsc::unbounded -- if we used mpsc::channel, I'd think Buffer::poll_ready could call mpsc::Sender::poll_ready. Is it more complicated than that? @carllerche @seanmonstar

@olix0r
Copy link
Member Author

olix0r commented Feb 7, 2018

It's not actually just as simple as switching to a channel, since there's a race between poll_ready and call. We'll need to adopt an mpsc variant that reserves a slot in the channel in a way that makes this race impossible.

@seanmonstar
Copy link
Contributor

I think the reason it is unbounded is because the router doesn't currently know what to do if the underlying service isn't ready. They're essentially ReadyServices, right?

So, if we put a limit on the buffer, then we need figure out what to do when that limit is hit. Respond with a 503? GOAWAY with ENHANCE_YOUR_CALM?

@carllerche
Copy link
Contributor

The main issue is the main point of Buffer is to allow an arbitrary Service value to be Clone. If you have a non-clone service, wrapping it with Buffer adds a queue and spawns a task to manage that queue. The Buffer service is then clone.

Now, services that are Clone could potentially be cloned for each request so that the service handle can be moved into the response future. This is to support "and_then" kind of cases. With the current mpsc channel, cloning the Sender for each request is essentially an unbounded channel.

@briansmith briansmith added this to the Conduit 0.3 milestone Feb 8, 2018
@briansmith
Copy link
Contributor

If I understand @carllerche, it seems like we don't need to use Buffer at all since we control our whole stack of services and we can (AFAICT) make them all Clone. Is that what we should do?

I'm nominating this for 0.3 since it's a quality issue that affects one the central selling points of Conduit (reliability and resource consumption).

@carllerche
Copy link
Contributor

@briansmith I don't recall all the details of the existing stack, but I believe that some flavors of middleware require the concept of Buffer in order to be Clone (in a futures world that is).

If all Service impls used in conduit are already Clone, then Buffer should probably be removed either way.

@olix0r
Copy link
Member Author

olix0r commented Feb 8, 2018

@carllerche the issue is that things like Reconnect are not clone, and it would probably be complicated to make them Clone

@carllerche
Copy link
Contributor

Ah yes... making Reconnect clone is wrapping it with a buffer layer.

@carllerche
Copy link
Contributor

So, I believe that the strategy is to limit the max number of in-flight requests. As far as I can tell, there are two options:

  • Limit on a per endpoint basis. Once the limit is reached, requests for that endpoint are dropped.
  • Limit on a router basis. Once this limit is reached, accepting requests is paused until there is capacity.

What should the strategy be?

@olix0r
Copy link
Member Author

olix0r commented Feb 16, 2018

@carllerche we should ideally be able to enforce this limit per endpoint so that slow endpoints can't DoS the whole proxy

@carllerche
Copy link
Contributor

@olix0r Then, the strategy would be to "load shed" (drop) requests to slow endpoints. If that is OK, then the implementation is relatively straightforward.

@olix0r
Copy link
Member Author

olix0r commented Feb 19, 2018

In progress: tower-rs/tower#49

@carllerche
Copy link
Contributor

How do we want to set the max in-flight requests per endpoint?

@olix0r
Copy link
Member Author

olix0r commented Feb 20, 2018

probably as part of the environment-defined configuration. I would also accept a const in the short-term...

@carllerche
Copy link
Contributor

What would be a good default value?

@olix0r
Copy link
Member Author

olix0r commented Feb 20, 2018

@carllerche at a guess, 10_000?

carllerche added a commit to carllerche/conduit that referenced this issue Feb 20, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Closes linkerd#287
carllerche added a commit to carllerche/conduit that referenced this issue Feb 20, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Closes linkerd#287

Signed-off-by: Carl Lerche <[email protected]>>
carllerche added a commit to carllerche/conduit that referenced this issue Feb 20, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes linkerd#287

Signed-off-by: Carl Lerche <[email protected]>>
@olix0r olix0r added the review/ready Issue has a reviewable PR label Feb 20, 2018
carllerche added a commit to carllerche/conduit that referenced this issue Feb 20, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes linkerd#287

Signed-off-by: Carl Lerche <[email protected]>
carllerche added a commit to carllerche/conduit that referenced this issue Feb 20, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes linkerd#287

Signed-off-by: Carl Lerche <[email protected]>
carllerche added a commit to carllerche/conduit that referenced this issue Feb 21, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes linkerd#287

Signed-off-by: Carl Lerche <[email protected]>
carllerche added a commit that referenced this issue Feb 21, 2018
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes #287

Signed-off-by: Carl Lerche <[email protected]>
@adleong adleong removed the review/ready Issue has a reviewable PR label Feb 21, 2018
khappucino pushed a commit to Nordstrom/linkerd2 that referenced this issue Mar 5, 2019
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes linkerd#287

Signed-off-by: Carl Lerche <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants