A question about slots size #205

oriontvv · 2022-05-19T08:44:31Z

Hi, folk! Thanks for your contribution!
I have actix-web service and deadpool-redis shows strange values of slots veqdeque size:

DURING HIGHLOAD
before: Status {
    max_size: 24,
    size: 33,
    available: -69,
}

after: Status {
    max_size: 24,
    size: 33,
    available: -70,
}
==================
AFTER HIGHLOAD
before: Status {
    max_size: 24,
    size: 2,
    available: 1,
}

after:Status {
    max_size: 24,
    size: 2,
    available: 0,
}

As I understood - allocated slots vecdeque is dropping inside return_object and new connections to redis will be allocated again after. Is it possible not to deallocate them and keep the pool's size with max_size elements?

The text was updated successfully, but these errors were encountered:

bikeshedder · 2022-05-19T09:41:07Z

I have actix-web service and deadpool-redis shows strange values of slots veqdeque size: (...)

The numbers during the load look a bit strange to me. Unless you are shrinking the pool during the load the size should never exceed max_size. I've taken a look at the implementation and don't see how something like that could ever happen. 🤔

Could you please share the exact versions you're using and maybe your pool configuration?

(...) Is it possible not to deallocate them and keep the pool's size with max_size elements?

With the redis crate and actix-web it's quite possible that you end up with a lot of broken connections at the end of a load due to fact that it's implementation is not cancellation safe:

Corrupt connections redis-rs/redis-rs#325

It is quite possible that there are ~70 workers waiting for an object from the pool, the workers currently in possession of an object get cancelled first and return the now broken objects. The waiting workers are cancelled as well but not 100% parallel thus a few might end up grabbing those just returned objects from the pool just to find them broken and discarding them. When finally all workers are cancelled most of the dead connections are gone and only a few connections remain in the pool - possibly broken, too.

It's by pure chance how many objects are left in the pool after that load. Most of them dead.

Are you load testing the service with wrk or any other tool that doesn't perform a clean shutdown oft he connections after reaching the time limit?

I'm only concerned about the size exceeding max_size during the load. Due to the redis-rs/redis-rs#325 there are no guarantees for redis connections to survive the cancellation of a future and the only mitigation for that would be to move all redis commands inside a tokio::spawn which doesn't cancel the query if the HTTP connection is closed prematurely.

oriontvv · 2022-05-19T11:50:33Z

Oh, I'm sorry, didn't mention versions:

deadpool-redis = { version = "0.10.0", features = ["serde"] }
redis = { version = "0.21.3", features = ["cluster", "tokio-comp", "connection-manager", "streams"] }
actix-http = "=3.0.0"
actix-rt = "2"
actix-web = "=4.0.1"
tokio = { version = "1.12.0", features = ["full"] }

There is nothing special with the pool:

use deadpool_redis::{Config, Pool, Runtime};

let cfg = Config::from_url(address);
let pool = cfg.create_pool(Some(Runtime::Tokio1)).unwrap()

// usage like
let mut connection = pool.get().await.unwrap();

Test load was made with simple tool locust - it just sends http requests like wrk and doesn't deal with connections directly.
I'm curious why connections could be broken - I expect that the size would be exactly max_size. Pool with size of 1/2 of connections look like useless entity). Will investigate more, thanks.
Thanks for quick reply!

bikeshedder · 2022-05-20T02:50:39Z

After several hours of trying to reproduce this issue I finally managed to write a test reproducing the bug you ran into: 1b09998

Funnily enough this bug was already fixed by a recent refactoring and clean-up of the pool internals (7ffa964). I just didn't publish a new version to crates.io as I considered it mainly a refactoring without fixing any known bugs.

I just release deadpool v0.9.5on crates.io: https://crates.io/crates/deadpool/0.9.5

Could you please run cargo update and see if this fixes your issues?

oriontvv · 2022-05-20T10:15:28Z

Yep, the bug is fixed, good job, thanks!

DURING LOAD
Before: Status {
    max_size: 24,
    size: 14,
    available: -540,
}

After: Status {
    max_size: 24,
    size: 16,
    available: -538,
}

================
AFTER LOAD
Before: Status {
    max_size: 24,
    size: 2,
    available: 1,
}

After: Status {
    max_size: 24,
    size: 2,
    available: 0,
}

Btw, do you consider the option to keep full buffer with max_size elements to optimize allocations?

bikeshedder · 2022-05-20T12:38:58Z

Btw, do you consider the option to keep full buffer with max_size elements to optimize allocations?

I need to explain how deadpool works so it makes more sense why it behaves like that:

Every time Pool::get is called it grabs a semaphore permit. There are max_size many available in total.
Once the semaphore permit is acquired it checks a queue (currently called slots)
If the object is usable (checked via Manager::recycle) it is returned
If the object can't be recycled the next object is tested
If no objects remain in the queue a new object is created

Let's assume you have a pool with a max_size of 24 and all objects are currently being used. Now all 24 connections get dropped and all of them are running queries. Due to redis-rs/redis-rs#325 those connections will fail to recycle.

If you timed this perfectly you can end up with a pool status claiming a size of 24. The next time a worker tries to get connection from the pool it will find all objects unusable, remove them from the pool and create a single new connection. Thus reducing the pool size 0 first and then increase it back to one.

There are multiple ways to solve this. One is changing the strategy to recycle connections when they are returned. This would require the pool to have one or more active workers that take returned connections, recycle them and push them back to the queue. This however causes more delay and just because the connection was recycled when it was returned to the pool doesn't mean it is safe to use when it is retrieved again. In an application that only serves a few requests a day it could happen easily that the connection in the pool is dead long before it is retrieved again.

My thinking here is:

In a highly contested pool ahead of time recycling doesn't help at all. The added worker task can even reduce the performance as there are more context switches involved.
In a non contested pool it could help reduce the latency when skipping the recycle operation on object retrieval.

I opted for a pool implementation that favors the high load scenario over the low latency one. And as nice side effect the implementation ended up simpler than other solutions.

I do plan on adding a way to add some ahead of time object creation and recycling. I won't bake it into the core of deadpool but add some extension points so users can pick and choose what strategy they want to use:

Ahead of time object creation/recycling #167

oriontvv · 2022-05-20T15:10:38Z

Got it. Very appreciate for the explanation. Thanks!

bikeshedder added question Further information is requested A-redis Area: Redis / deadpool-redis labels May 19, 2022

oriontvv changed the title ~~A question aboud slots size~~ A question about slots size May 19, 2022

bikeshedder added bug Category: This is a bug. and removed question Further information is requested labels May 20, 2022

oriontvv closed this as completed May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about slots size #205

A question about slots size #205

oriontvv commented May 19, 2022 •

edited

Loading

bikeshedder commented May 19, 2022

oriontvv commented May 19, 2022

bikeshedder commented May 20, 2022

oriontvv commented May 20, 2022 •

edited

Loading

bikeshedder commented May 20, 2022

oriontvv commented May 20, 2022

A question about slots size #205

A question about slots size #205

Comments

oriontvv commented May 19, 2022 • edited Loading

bikeshedder commented May 19, 2022

oriontvv commented May 19, 2022

bikeshedder commented May 20, 2022

oriontvv commented May 20, 2022 • edited Loading

bikeshedder commented May 20, 2022

oriontvv commented May 20, 2022

oriontvv commented May 19, 2022 •

edited

Loading

oriontvv commented May 20, 2022 •

edited

Loading