keep pushback intact on Multi.groupItems().intoLists().of(size,duration) #493

voortwis · 2021-03-04T16:04:11Z

In the previous implementation the logic was partly handled by calling intoMultis().every(Duration). This was causing to get an unlimited number of items from upstream.
When processing a kafka stream with many small messages fitting in memory, the throttled policy would eventually (60seconds) see 'stale' non-processed messages causing an exception shutting processing down completely. This problem is solved using the MultiBufferWithTimeoutOp directly.

In the previous implementation the logic was partly handled by calling intoMultis().every(Duration). This was causing to get an unlimited number of items from upstream. When processing a kafka stream with many small messages fitting in memory, the throttled policy would eventually (60seconds) see 'stale' non-processed messages causing an exception shutting processing down completely. This problem is solved using the MultiBufferWithTimeoutOp directly.

codecov · 2021-03-04T16:16:46Z

Codecov Report

Merging #493 (5ef9bed) into master (f8bb371) will increase coverage by 0.21%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #493      +/-   ##
============================================
+ Coverage     89.95%   90.16%   +0.21%     
- Complexity     2931     2937       +6     
============================================
  Files           382      382              
  Lines         11340    11340              
  Branches       1420     1422       +2     
============================================
+ Hits          10201    10225      +24     
+ Misses          578      562      -16     
+ Partials        561      553       -8

Impacted Files	Coverage Δ	Complexity Δ
...io/smallrye/mutiny/groups/MultiGroupIntoLists.java	`100.00% <100.00%> (ø)`	`5.00 <1.00> (-1.00)`
...tiny/operators/uni/builders/DefaultUniEmitter.java	`93.10% <0.00%> (-6.90%)`	`12.00% <0.00%> (-1.00%)`
...mallrye/mutiny/helpers/queues/MpscLinkedQueue.java	`90.32% <0.00%> (-6.46%)`	`27.00% <0.00%> (-1.00%)`
...smallrye/mutiny/operators/multi/MultiBufferOp.java	`88.70% <0.00%> (-0.81%)`	`5.00% <0.00%> (ø%)`
...java/io/smallrye/mutiny/helpers/Subscriptions.java	`79.00% <0.00%> (+0.55%)`	`47.00% <0.00%> (ø%)`
...allrye/mutiny/operators/uni/UniAndCombination.java	`86.04% <0.00%> (+1.16%)`	`4.00% <0.00%> (ø%)`
.../io/smallrye/mutiny/helpers/queues/DrainUtils.java	`77.77% <0.00%> (+2.22%)`	`12.00% <0.00%> (+1.00%)`
...y/operators/multi/builders/IterableBasedMulti.java	`86.17% <0.00%> (+4.25%)`	`4.00% <0.00%> (ø%)`
...lrye/mutiny/subscription/SerializedSubscriber.java	`87.73% <0.00%> (+5.66%)`	`27.00% <0.00%> (+2.00%)`
...tiny/operators/multi/MultiBufferWithTimeoutOp.java	`72.88% <0.00%> (+6.77%)`	`4.00% <0.00%> (+1.00%)`
... and 2 more

jponge · 2021-03-05T07:18:36Z

Would you have a complete use-case to understand the context of this pull request and what exactly it fixes?

voortwis · 2021-03-05T07:45:58Z

Would you have a complete use-case to understand the context of this pull request and what exactly it fixes?

When reading a high volume kafka topic we first need to do some processing and then we store the resulting events in an elasticsearch index. We insert these in batches (of configurable size) into elastic to gain a massive performance increase. Using the throttled setting on the mutiny-kafka-connector we ensure async and at least once processing. Problem was that the upstream would read all data from kafka resulting in not timely processed (acked) events.

The line: .groupItems().intoLists().of(100, Duration.ofMillis(1000)) was requesting Integer.MAX_VALUE, after the change it requests the list-size.

Example simulation code:

    @Incoming("events")
    @Outgoing("events-persisted")
    public Multi<Message<Integer>> persistEventsMulti(Multi<Message<Integer>> events) {
        return events
                //.onRequest().invoke(reqSize -> { log.info("Events-Request: "+reqSize); }) // reqSize: 16
                .emitOn(Infrastructure.getDefaultWorkerPool())
                //.onRequest().invoke(reqSize -> { log.info("Events-Request-group-list: "+reqSize); }) // reqSize: 100 - this was: Integer.MAX_VALUE
                .groupItems().intoLists().of(100, Duration.ofMillis(1000))
                //.onRequest().invoke(reqSize -> { log.info("Events-Request-flatmap: "+reqSize); }) // reqSize: 1
                .flatMap(messages -> {
                    persistBatch(messages);
                    return Multi.createFrom().iterable(messages);
                })
                //.onRequest().invoke(reqSize -> { log.info("Events-Request-after-persist: "+reqSize); }) // reqSize: 16
                .emitOn(Infrastructure.getDefaultExecutor())
                ;
    }

    public void persistBatch(List<Message<Integer>> events) {
        Utils.sleep(100);
    }

    @Incoming("events-persisted")
    public CompletionStage<Void> messageAcknowledging(Message<Integer> message){
        return message.ack();
    }

voortwis · 2021-03-05T09:24:43Z

Update: This changed fixed the pushback problem. Testing this in a more realistic setting (kube, dev environment) returned an exception.
This occurs after first processing 1M+ events. A couple of minutes nothing, then producing (and restarting) processing again.
I will look into this, maybe one of you can give me a hint what to check.

2021-03-05 09:11:11,760 ERROR [io.sma.rea.mes.provider] (executor-thread-18) SRMSG00201: Error caught while processing a message: io.smallrye.mutiny.subscription.BackPressureFailure: Cannot emit item due to lack of requests

cescoffier · 2021-03-05T10:18:24Z

Very good catch!

MultiWindowOnDurationOp is using request(Long.MAX_VALUE); which in this case is not correct.

Thanks!

cescoffier · 2021-03-05T10:22:12Z

Damned, didn't see your last comment.

The back pressure failure you see is due to a consumer not consuming fast enough.

cescoffier · 2021-03-05T10:25:26Z

Actually, pretty sure it comes from io.smallrye.mutiny.subscription.MultiSubscriber#onFailure line 161.

Basically every time window we try to emit the accumulated items. If we don't have requests, we fail, as we can't continue accumulating (that would be an OOM)

voortwis · 2021-03-05T10:33:10Z

Actually, pretty sure it comes from io.smallrye.mutiny.subscription.MultiSubscriber#onFailure line 161.

That is the only place with that exact exception text. What I see is that my downstream requested 16. Then the MultiBufferWithTimeoutOp is producing these 16 lists, counting down line:144 - most of the time it will jump from 1 to 16. Sometimes it ends at 0. Currently trying to get a better understanding why.

voortwis · 2021-03-05T13:23:39Z

Exception only occurs when there is no .emitOn(Infrastructure.getDefaultWorkerPool()) before the group into list with size and duration. Maybe one of you can explpain this difference in behaviour?

jponge requested a review from cescoffier March 4, 2021 20:59

jponge added this to the 0.15.0 milestone Mar 4, 2021

voortwis changed the title ~~keep pushback intact on size+duration~~ keep pushback intact on Multi.groupItems().intoLists().of(size,duration) Mar 5, 2021

cescoffier approved these changes Mar 5, 2021

View reviewed changes

cescoffier merged commit 37def3f into smallrye:master Mar 5, 2021

cescoffier added the bug Something isn't working label Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep pushback intact on Multi.groupItems().intoLists().of(size,duration) #493

keep pushback intact on Multi.groupItems().intoLists().of(size,duration) #493

voortwis commented Mar 4, 2021

codecov bot commented Mar 4, 2021 •

edited

Loading

jponge commented Mar 5, 2021

voortwis commented Mar 5, 2021

voortwis commented Mar 5, 2021

cescoffier commented Mar 5, 2021

cescoffier commented Mar 5, 2021

cescoffier commented Mar 5, 2021

voortwis commented Mar 5, 2021

voortwis commented Mar 5, 2021

keep pushback intact on Multi.groupItems().intoLists().of(size,duration) #493

keep pushback intact on Multi.groupItems().intoLists().of(size,duration) #493

Conversation

voortwis commented Mar 4, 2021

codecov bot commented Mar 4, 2021 • edited Loading

Codecov Report

jponge commented Mar 5, 2021

voortwis commented Mar 5, 2021

voortwis commented Mar 5, 2021

cescoffier commented Mar 5, 2021

cescoffier commented Mar 5, 2021

cescoffier commented Mar 5, 2021

voortwis commented Mar 5, 2021

voortwis commented Mar 5, 2021

codecov bot commented Mar 4, 2021 •

edited

Loading