An attempt to optimize AsyncConsumerWorkService.WorkPool dispatch loop #352

vendre21 · 2017-09-14T18:51:37Z

Further discussion on this PR:
#350 (comment)

bording · 2017-09-14T18:59:53Z

Adding a BlockingCollection to a code path that is supposed to be async defeats the purpose of having an async code path. This PR means the thread is blocked waiting for the collection.

michaelklishin · 2017-09-14T19:13:17Z

What problem does this PR solve? I don't think the comment linked answers this question.

vendre21 · 2017-09-14T19:22:40Z

@YulerB has mentioned an improvement on given class. The BlockingCollection will be a good alternative for Task.Delay and for while cycle, which also holds the current thread inefficiently. So basically this PR will speed up consumers.

michaelklishin · 2017-09-14T19:28:14Z

Is there hard evidence of this? Benchmarks that were used to demonstrate the improvement (and the workload tested)?

On Thu, 14 Sep 2017 at 15:22, Vajda Endre ***@***.***> wrote: @YulerB <https://github.com/yulerb> has mentioned an improvement on given class. The BlockingCollection will be a good alternative for Task.Delay and for while cycle, which also holds the current thread inefficiently. So basically this PR will speed up consumers. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#352 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEQigOn4nQD62OxWVGySXUzb-w-gXbks5siX0AgaJpZM4PYBi4> .

-- Staff Software Engineer, Pivotal/RabbitMQ

YulerB · 2017-09-14T20:49:16Z

The BlockingCollection isn't blocking the asyncrony. It pauses the execution when the queue is empty.

The code is greatly simplified by this PR.

It uses less op-codes in user code.

It's more readable.

It's now something people can understand.

The code took a long time to mentally check.

The code may have unforseen bugs relating to the try methods failing, that I for one cannot follow.

The code reduces the cyclomactic complexity of the method.

The blockingcollection was designed for this purpose.

As for performance, im sure its going to win out.

Allot of the code is smart, but apply the KISS principle where possible.

And it's cheaper for all of us to leverage the framework, rather than implement our own.

michaelklishin · 2017-09-15T01:55:27Z

@YulerB @vendre21 so are there any benchmarks or profiling data that your team used to come up with this and can share?

I can see the argument that this simplifies the code. A BlockingCollection in this specific case may be OK. But from my experience (and I'm sure @bording would concur) humans are terrible at reasoning about concurrent program execution and efficiency even for a small number of workloads.
Benchmarks and profiling are more reliable tools than developer gut feeling.

YulerB · 2017-09-15T09:55:34Z

Please see bench code attached.

I had to pull code out of the client to test.

Batch script to run to collect results:

perfbenchworkpool new single
perfbenchworkpool new multiple
perfbenchworkpool old single
perfbenchworkpool old multiple

Each is separated to get accurate results, since there seems to be a memory leak in the old version due to excessive lockinig.

Results:
C:>perfbenchworkpool new single
New Single Producer - Avg:17.36ms, Max:70ms, Min:10ms, Memory:78784

C:>perfbenchworkpool new multiple
New Multiple Producers - Avg:113.08ms, Max:385ms, Min:70ms, Memory:85568

C:>perfbenchworkpool old single
Old Single Producer - Avg:44.37ms, Max:152ms, Min:3ms, Memory:260016376

C:>perfbenchworkpool old multiple
Old Multiple Producers - Avg:179.63ms, Max:1114ms, Min:37ms, Memory:1439127520

Program.cs.txt

YulerB · 2017-09-15T10:28:28Z

What we see is when stop is called, the current code doesn't stop until the queue is empty. The updated code will stop when the cancellation token is set to cancel.
So maybe not a memory leak.
I'll leave it up to you guys to figure out if this is a bug.

YulerB · 2017-09-15T10:42:42Z

If it’s a bug, put this in the while loop:
if(tokenSource.IsCancellationRequested) {
break;
}
Then the numbers will probably be pretty much the same.
And if they are roughly the same, I'd change the code to make sure this questions/complexity doesn't come back again to haunt us all.

YulerB · 2017-09-15T11:14:20Z

Updated bench and results
Program.cs.txt
New Single - Avg:108ms, Max:190ms, Min:85ms, Memory:214392, Processed:1202632
Old Single Producer - Avg:82.94ms, Max:300ms, Min:58ms, Memory:225480, Processed:584946
New Multiple - Avg:553.69ms, Max:771ms, Min:405ms, Memory:212472, Processed:4292341
Old Multiple - Avg:450.7ms, Max:626ms, Min:354ms, Memory:213776, Processed:2297539

I realize the blockingcollection is slighly slower now after fixing the bug in the old while loop, but check out how many items got processed.

michaelklishin · 2017-09-15T14:22:26Z

@YulerB we intentionally process outstanding consumer operations before stopping. Not doing so will be considered a bug by some. With automatic acknowledgements this can effectively lose/ignore some deliveries, which is not necessarily a major issue given the "fire and forget" nature of that mode but still would be a surprising breaking change.

With manual acknowledgements it can result in a certain number of delivered messages to be requeued, which is fine.

So I'd really like to preserve the current behavior.

michaelklishin · 2017-09-15T14:25:43Z

@YulerB averages and min/max values are not ideal in benchmarks, we switched PerfTest to use median/95th/99th percentile earlier this year, for example. I'd also consider increasing the number of iterations to, say, 10M or 100M to let the benchmark run longer.

Thank you for producing it, by the way!

michaelklishin · 2017-09-15T14:27:15Z

@bording @danielmarbach so we seem to "win some, lose some here". WDYT of this proposal? Blocking collection use aside, I'd agree that it is a certain simplification.

Are there any NServiceBus benchmarks you can run to compare the two approaches?

danielmarbach · 2017-09-15T15:32:32Z

@michaelklishin I don't understand how simplification of the code is an argument here. And unfortunately the blocking collection cannot just be left aside in the argument we are having.

When you look at concurrency and parallelism how you are using the threadpool is essential. Because every thread that we can free up is worth freeing up because during that time it can work on other things and we reduce the chances of hill-climbing in the thread pool (or ramping up threads). So what it means is with the while loop in combination with the delay while we might have a context switch we are freeing up the thread when the queue is empty. On the other hand the blocking collection uses internally wait handles that entirely block the thread plus some non trivial cancellation token source linking which has other memory implications. When the blocking collection is empty the thread is block and cannot be used. While this might not be an issue when you look at it from a single consumer thread perspective it becomes an issue when you look at the process as a whole containing potentially multiple such consumers.

I cannot judge whether you as maintainers are willing to make those trade-offs for the argument of simplicity. I would not.

michaelklishin · 2017-09-15T16:15:55Z

@danielmarbach thanks for your feedback. I'd like take a closer look at where this collection is used, since most concurrent systems are snowflakes, even though they are built from common primitives.

I have a couple of things I'd like to clarify. In this PR, a blocking collection is used in the consumer work service to "batch" pending operation dispatch in a loop. Can that thread be reused for something else in practice? Good question, I assume the answer is "yes" but I don't know enough about the .NET runtime or TPL task scheduling.

There is one WorkPool instance per channel as of #307. @danielmarbach is there a scenario in which once the number of consumers (or channels) grows, we can see increased contention around the BlockingCollection operations?

Currently this PR doesn't seem to be an obvious improvement even with 5 consumers. You win some on some metrics, you lose some in terms of latency.

@YulerB can you please add a few more versions of your benchmark that have 100, 250, 500 and 1000 "consumers" (tasks) sharing a pool? Those numbers may seem unusual but on a system with 16, 32 or greater number of cores, having that many consumers no longer seems crazy. I expect the results can be quite
different from the currently posted ones with 1 and 5 "consumers".

Optimizing for a single consumer is not something our team usually does (even though we see arguments for that from time to time), both in client libraries and in RabbitMQ itself.

YulerB · 2017-09-16T21:11:43Z

So, if the items in the queue are allowed to complete when stop is called, then we are queueing an action on a ConcurrentQueue to then queue on the thread pool queue, and only need to stop queuing when stop is called.

If this is the case, we could skip the workpool and queue directly on the thread pool.

internal class AsyncConsumerWorkService : ConsumerWorkService{
    readonly SynchronizedList<IModel> workPools = new SynchronizedList<IModel>();
    bool go = true;
    public void Schedule<TWork>(ModelBase model, TWork work) where TWork : Work {
        if (go && !workPools.Contains(model)) Task.Run(() => work.Execute(model));
    }
    public void Stop(IModel model){
        workPools.Add(model);
    }
    public void Stop(){
        go = false;
    }
}

michaelklishin · 2017-09-18T06:44:23Z

@YulerB WorkPool's responsibility is to offer a per-channel dispatch ordering guarantee. It's a port of the same idea in the Java client. Wouldn't the snippet above create a race condition between tasks for a single channel (there's no ordering guarantee between channels, by design)?

YulerB · 2017-09-18T08:09:46Z

If it’s truly async, then you cannot guarantee ordering.
If it’s a background thread to process the queue sequentially, then yes you would have ordering.
If this is the case, async simply means on a single thread.

I reckon, all the operations don't require ordering, maybe only the message delivery, and only for some applications. If this is the case, we should add directly to the threadpool all entries except deliver message which will be peformed sequentially on the background thread.

So, there really are 3 use cases,

Sync (Original)
Background thread per channel (The current Async)
Async (The code above)

I've also added await/async for read operations to our version. The BinaryReader is more trouble than its worth. BitConverter is a great classs.

danielmarbach · 2017-09-18T11:33:38Z

If it’s truly async, then you cannot guarantee ordering.

Async != concurrent

If this is the case, async simply means on a single thread.

Async != single thread

The code above would offload an inherently IO bound problem to the worker pool which is not desirable. When the async consumer service was introduced it was meant as a first step as an enable for asynchronous consumer code. We knew that the current model code is still IO bound but blocking but the consumer service would at least enable existing asynchronous third party code to be executed in an asynchronous way and would allow to more naturally combine such code with the new async enabled APIs. Happy to talk this through but I think we need to clarify a few terminologies here first so that we all talk about the same thing (no offence meant).

YulerB · 2017-09-18T12:58:23Z

@michaelklishin, yes it would create a race condition if ordering was required.
My use case doesn't require ordering for any of my consumers.
I'm processing tons of messages now concurrently (thanks @danielmarbach).

michaelklishin · 2017-09-18T13:58:30Z

@YulerB I don't subscribe to the idea that "if it's truly async you cannot guarantee ordering". You can guarantee per-channel dispatch ordering. Concurrently running consumer operations that require synchronisation is application developer's concern and libraries cannot fully avoid concurrency hazards.

michaelklishin · 2017-09-18T14:00:46Z

@YulerB thanks for your time on this but unless we have benchmarks that prove this is more efficient with a very large number of "consumers", this PR has no chance of getting in. It's not an obvious improvement from the basic benchmarks we have and the subtle behaviour changes that were discussed and tested in @danielmarbach's async dispatcher PR are completely overlooked here.

michaelklishin · 2017-09-18T15:41:35Z

Also keep in mind that there are plans to develop a new .NET client from scratch targeting only .NET Core and the most recent C# version available. We expect that the work will start in Q1 next year. Many async/await-related design ideas should go there.

…

On Mon, Sep 18, 2017 at 6:58 AM, Brian Yule ***@***.***> wrote: @michaelklishin <https://github.com/michaelklishin>, yes it would create a race condition if ordering was required. My use case doesn't require ordering for any of my consumers. I'm processing tons of messages now concurrently (thanks @danielmarbach <https://github.com/danielmarbach>). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#352 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEQprFyE2Q3JgJeAWas2ie_4fREAOsks5sjmjwgaJpZM4PYBi4> .

-- MK Staff Software Engineer, Pivotal/RabbitMQ

vendre21 · 2017-09-18T16:21:58Z

Thank you very much everyone's effort on this PR. I gonna close as this was overlooked enough.

michaelklishin · 2017-09-18T16:35:49Z

Thank you, Vajda. We definitely appreciate your interest in improving the client. Your colleague's other PR is looking good so far, we just need to add a few new integration tests and let dependent projects give it a try before merging.

…

On Mon, Sep 18, 2017 at 10:21 AM, Vajda Endre ***@***.***> wrote: Thank you very much everyone's effort on this PR. I gonna close as this was overlooked enough. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#352 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEQp0b8f_g1hgtZSE5iTHs50pBqYnWks5sjpingaJpZM4PYBi4> .

-- MK Staff Software Engineer, Pivotal/RabbitMQ

AsyncConsumerWorkService improvement

20d3606

michaelklishin changed the title ~~AsyncConsumerWorkService improvement~~ An attempt to optimize AsyncConsumerWorkService.WorkPool dispatch loop Sep 15, 2017

vendre21 closed this Sep 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An attempt to optimize AsyncConsumerWorkService.WorkPool dispatch loop #352

An attempt to optimize AsyncConsumerWorkService.WorkPool dispatch loop #352

vendre21 commented Sep 14, 2017 •

edited

Loading

bording commented Sep 14, 2017

michaelklishin commented Sep 14, 2017

vendre21 commented Sep 14, 2017

michaelklishin commented Sep 14, 2017 via email

YulerB commented Sep 14, 2017

michaelklishin commented Sep 15, 2017 •

edited

Loading

YulerB commented Sep 15, 2017

YulerB commented Sep 15, 2017

YulerB commented Sep 15, 2017

YulerB commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

danielmarbach commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

YulerB commented Sep 16, 2017 •

edited by michaelklishin

Loading

michaelklishin commented Sep 18, 2017

YulerB commented Sep 18, 2017

danielmarbach commented Sep 18, 2017 •

edited

Loading

YulerB commented Sep 18, 2017

michaelklishin commented Sep 18, 2017

michaelklishin commented Sep 18, 2017

michaelklishin commented Sep 18, 2017 via email

vendre21 commented Sep 18, 2017

michaelklishin commented Sep 18, 2017 via email

An attempt to optimize AsyncConsumerWorkService.WorkPool dispatch loop #352

An attempt to optimize AsyncConsumerWorkService.WorkPool dispatch loop #352

Conversation

vendre21 commented Sep 14, 2017 • edited Loading

bording commented Sep 14, 2017

michaelklishin commented Sep 14, 2017

vendre21 commented Sep 14, 2017

michaelklishin commented Sep 14, 2017 via email

YulerB commented Sep 14, 2017

michaelklishin commented Sep 15, 2017 • edited Loading

YulerB commented Sep 15, 2017

YulerB commented Sep 15, 2017

YulerB commented Sep 15, 2017

YulerB commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

danielmarbach commented Sep 15, 2017

michaelklishin commented Sep 15, 2017

YulerB commented Sep 16, 2017 • edited by michaelklishin Loading

michaelklishin commented Sep 18, 2017

YulerB commented Sep 18, 2017

danielmarbach commented Sep 18, 2017 • edited Loading

YulerB commented Sep 18, 2017

michaelklishin commented Sep 18, 2017

michaelklishin commented Sep 18, 2017

michaelklishin commented Sep 18, 2017 via email

vendre21 commented Sep 18, 2017

michaelklishin commented Sep 18, 2017 via email

vendre21 commented Sep 14, 2017 •

edited

Loading

michaelklishin commented Sep 15, 2017 •

edited

Loading

YulerB commented Sep 16, 2017 •

edited by michaelklishin

Loading

danielmarbach commented Sep 18, 2017 •

edited

Loading