Combining a tower Service with a tonic client panics #547

iffyio · 2021-01-26T10:33:50Z

prost = "0.7.0"
http = "0.2"
tokio = { version = "=1.1.0", features = ["full"] }
tonic = { version = "0.4.0", features = ["tls", "tls-roots" ]}
tower = "0.4.4"

Hi, I get the following while updating my deps to tokio v1 and tonic/tower along with it.

panicked at 'buffer full; poll_ready must be called first'

Can confirm that poll_ready was in fact called and returned Poll::Ready(Ok(())) before call was invoked.
Looking at the code, it seems related to cloning a Semaphore, which resets to State::Empty so that call is now always invoked with State::Empty even though a permit was acquired successfully? Or maybe something's fundamentally changed with 0.4 re how this works?

Included here a small example scenario that hits this issue and hopefully illustrates the setup:

#[derive(Clone)]
pub struct Svc(Channel);

impl Service<Request<BoxBody>> for Svc {
    type Response = Response<Body>;
    type Error = Box<dyn std::error::Error + Send + Sync>;
    type Future = Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.0.poll_ready(cx).map_err(Into::into)
    }

    fn call(&mut self, req: Request<BoxBody>) -> Self::Future {
        let mut inner = self.0.clone();
        Box::pin(async move {
            inner.call(req).await.map_err(Into::into)
        })
    }
}

#[tokio::main]
async fn main() {
    type Client = MyGrpcClient<Svc>;
    let domain = "example.com";
    let tls_config = ClientTlsConfig::new().domain_name(domain);
    let conn = Channel::from_shared("https://".to_owned() + domain)
        .unwrap()
        .tls_config(tls_config)
        .unwrap()
        .connect()
        .await
        .unwrap();

    Client::new(Svc(conn)).rpc_request(RpcPayload {}).await.unwrap();
}

The text was updated successfully, but these errors were encountered:

davidpdrsn · 2021-01-26T15:14:07Z

Hm yeah that does look odd. I'm not very familiar with this part of the code but seems odd to me that Semaphore is Clone in the first place. I would imagine that not being the case and requiring an Arc to make it clone. tokio::sync::Semaphore for reference is not Clone. However they also don't have any methods that take &mut self so not sure its possible for tower's Semaphore.

@hawkw Do you know?

olix0r · 2021-01-26T15:40:43Z

impl Service<Request<BoxBody>> for Svc {
    type Response = Response<Body>;
    type Error = Box<dyn std::error::Error + Send + Sync>;
    type Future = Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.0.poll_ready(cx).map_err(Into::into)
    }

    fn call(&mut self, req: Request<BoxBody>) -> Self::Future {
        let mut inner = self.0.clone();
        Box::pin(async move {
            inner.call(req).await.map_err(Into::into)
        })
    }
}

This subtly breaks the contract. The service is driven to ready and then cloned before it is invoked. The original service is ready, but the clone is not necessarily ready.

To fix this, the call function could be rewritten as:

       Box::pin(self.0.call(req).err_into::<Error>())

which avoids the cloning.

If cloning is really necessary, you could use let mut inner = mem::replace(&mut self.0, self.0.clone()) to "take" the ready service and replace it with the clone.

iffyio · 2021-01-27T07:58:55Z

Ah I see, that's unfortunate this accidentally worked. I initially thought of the replace alternative but that felt like a hack, would've been nice with an api that didn't allow this e.g not making things cloneable rather than panic 🤔 Thanks for clarifying!

davidpdrsn · 2021-01-27T10:45:10Z

Uhh thats really subtle and not something I had thought of. I think makes sense to mention it in the Service docs. I added a PR for that #548.

davidpdrsn · 2021-02-07T14:28:46Z

Some more context for people who might discover this in the future:

I have tried implementing a version of Buffer that doesn't have this problem but not successfully. The root problem is that in poll_ready we want to reserve space in the channel we're using to send requests to the background worker and then use that allocated space in call. This leads to a few challenges.

It is currently done using a semaphore but could in theory also be done using tokio::sync::mpsc::Sender::reserve. Both methods give you some kind of permit, that is something that proves you have reserved capacity in a channel. So in poll_ready we obtain such a permit and then use it in call. That means we need to store the permit on Buffer to get it from poll_ready to call. However these permits cannot be Clone since that would allow you to send things without reserving capacity first. So if Buffer stores such a permit then it cannot be Clone either. That breaks one of the main use cases of Buffer, to make services Clone.

The current implementation gets around this by Semaphore storing the permit internally and call taking that permit and panicing if for some reason there isn't a permit in the Semaphore. As mentioned Semaphores are made Clone by discarding the permit they store internally.

We could just ignore poll_ready and both reserve and use the channel capacity in call but that means other middlewares cannot check for readiness of a Buffer without also calling it. That would break things like load balancers so it also isn't acceptable.

I think the "token" solution suggested here could fix this. We would probably be able to make Buffer::Token the permit and require users to pass it to call, thus removing the need for Buffer to store the permit internally.

0xAlcibiades · 2023-09-21T02:57:18Z

Has anyone looked into a fix for this here in tower, or in tonic?

erichulburd · 2024-06-27T23:34:41Z

is there an issue with using std::future::poll_fn within the Pin<Box<dyn Future>>? For instance, something like the following:

impl Service<Request<BoxBody>> for Svc {
    type Response = Response<Body>;
    type Error = Box<dyn std::error::Error + Send + Sync>;
    type Future = Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.0.poll_ready(cx).map_err(Into::into)
    }

    fn call(&mut self, req: Request<BoxBody>) -> Self::Future {
        let mut inner = self.0.clone();
        Box::pin(async move {
            std::future::poll_fn(|cx| inner.poll_ready(cx))
                .await
                .map_err(Into::into)
                .and_then(|_| inner.call(req).await.map_err(Into::into))
        })
    }
}

I guess I should mention that my real use case is implementing retries within call, requiring invocation of inner.call().await multiple times. Some preliminary testing has checked out, but I'm not sure how well it would hold up in a more asynchronous environment. Here's a hand wavy idea of what I'd be doing:

    fn call(&mut self, req: Request<BoxBody>) -> Self::Future {
        let mut inner = self.0.clone();
        Box::pin(async move {
            for _ in 0..3 {
                std::future::poll_fn(|cx| inner.poll_ready(cx))
                    .await
                    .map_err(Into::into)
                    .and_then(|_| inner.call(req).await.map_err(Into::into))
            }
        })
    }

This is stop-gap for a more fully baked version of ergonomic retries: #682 and a tonic version that relies on http with clonable requests (see hyperium/tonic#733).

iffyio closed this as completed Jan 27, 2021

davidpdrsn mentioned this issue Jan 27, 2021

example tower-client "buffer full; poll_ready must be called first" hyperium/tonic#545

Closed

smhmayboudi mentioned this issue Jan 27, 2021

fix(examples): Fix tower examples hyperium/tonic#547

Closed

davidpdrsn mentioned this issue Mar 20, 2021

Add FollowRedirect middleware tower-rs/tower-http#79

Merged

Arqu added a commit to n0-computer/iroh that referenced this issue Oct 6, 2022

try tower-rs/tower#547 fix

dc8d426

Arqu added a commit to n0-computer/iroh that referenced this issue Oct 17, 2022

try tower-rs/tower#547 fix

a614b11

Arqu added a commit to n0-computer/iroh that referenced this issue Oct 18, 2022

try tower-rs/tower#547 fix

869a3c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining a tower Service with a tonic client panics #547

Combining a tower Service with a tonic client panics #547

iffyio commented Jan 26, 2021

davidpdrsn commented Jan 26, 2021

olix0r commented Jan 26, 2021

iffyio commented Jan 27, 2021

davidpdrsn commented Jan 27, 2021

davidpdrsn commented Feb 7, 2021

0xAlcibiades commented Sep 21, 2023 •

edited

Loading

erichulburd commented Jun 27, 2024 •

edited

Loading

Combining a tower Service with a tonic client panics #547

Combining a tower Service with a tonic client panics #547

Comments

iffyio commented Jan 26, 2021

davidpdrsn commented Jan 26, 2021

olix0r commented Jan 26, 2021

iffyio commented Jan 27, 2021

davidpdrsn commented Jan 27, 2021

davidpdrsn commented Feb 7, 2021

0xAlcibiades commented Sep 21, 2023 • edited Loading

erichulburd commented Jun 27, 2024 • edited Loading

0xAlcibiades commented Sep 21, 2023 •

edited

Loading

erichulburd commented Jun 27, 2024 •

edited

Loading