Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep pushback intact on Multi.groupItems().intoLists().of(size,duration) #493

Conversation

voortwis
Copy link
Contributor

@voortwis voortwis commented Mar 4, 2021

In the previous implementation the logic was partly handled by calling intoMultis().every(Duration). This was causing to get an unlimited number of items from upstream.
When processing a kafka stream with many small messages fitting in memory, the throttled policy would eventually (60seconds) see 'stale' non-processed messages causing an exception shutting processing down completely. This problem is solved using the MultiBufferWithTimeoutOp directly.

In the previous implementation the logic was partly handled by calling intoMultis().every(Duration). This was causing to get an unlimited number of items from upstream.
When processing a kafka stream with many small messages fitting in memory, the throttled policy would eventually (60seconds) see 'stale' non-processed messages causing an exception shutting processing down completely. This problem is solved using the MultiBufferWithTimeoutOp directly.
@codecov
Copy link

codecov bot commented Mar 4, 2021

Codecov Report

Merging #493 (5ef9bed) into master (f8bb371) will increase coverage by 0.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #493      +/-   ##
============================================
+ Coverage     89.95%   90.16%   +0.21%     
- Complexity     2931     2937       +6     
============================================
  Files           382      382              
  Lines         11340    11340              
  Branches       1420     1422       +2     
============================================
+ Hits          10201    10225      +24     
+ Misses          578      562      -16     
+ Partials        561      553       -8     
Impacted Files Coverage Δ Complexity Δ
...io/smallrye/mutiny/groups/MultiGroupIntoLists.java 100.00% <100.00%> (ø) 5.00 <1.00> (-1.00)
...tiny/operators/uni/builders/DefaultUniEmitter.java 93.10% <0.00%> (-6.90%) 12.00% <0.00%> (-1.00%)
...mallrye/mutiny/helpers/queues/MpscLinkedQueue.java 90.32% <0.00%> (-6.46%) 27.00% <0.00%> (-1.00%)
...smallrye/mutiny/operators/multi/MultiBufferOp.java 88.70% <0.00%> (-0.81%) 5.00% <0.00%> (ø%)
...java/io/smallrye/mutiny/helpers/Subscriptions.java 79.00% <0.00%> (+0.55%) 47.00% <0.00%> (ø%)
...allrye/mutiny/operators/uni/UniAndCombination.java 86.04% <0.00%> (+1.16%) 4.00% <0.00%> (ø%)
.../io/smallrye/mutiny/helpers/queues/DrainUtils.java 77.77% <0.00%> (+2.22%) 12.00% <0.00%> (+1.00%)
...y/operators/multi/builders/IterableBasedMulti.java 86.17% <0.00%> (+4.25%) 4.00% <0.00%> (ø%)
...lrye/mutiny/subscription/SerializedSubscriber.java 87.73% <0.00%> (+5.66%) 27.00% <0.00%> (+2.00%)
...tiny/operators/multi/MultiBufferWithTimeoutOp.java 72.88% <0.00%> (+6.77%) 4.00% <0.00%> (+1.00%)
... and 2 more

@jponge jponge requested a review from cescoffier March 4, 2021 20:59
@jponge jponge added this to the 0.15.0 milestone Mar 4, 2021
@jponge
Copy link
Member

jponge commented Mar 5, 2021

Would you have a complete use-case to understand the context of this pull request and what exactly it fixes?

@voortwis voortwis changed the title keep pushback intact on size+duration keep pushback intact on Multi.groupItems().intoLists().of(size,duration) Mar 5, 2021
@voortwis
Copy link
Contributor Author

voortwis commented Mar 5, 2021

Would you have a complete use-case to understand the context of this pull request and what exactly it fixes?

When reading a high volume kafka topic we first need to do some processing and then we store the resulting events in an elasticsearch index. We insert these in batches (of configurable size) into elastic to gain a massive performance increase. Using the throttled setting on the mutiny-kafka-connector we ensure async and at least once processing. Problem was that the upstream would read all data from kafka resulting in not timely processed (acked) events.

The line: .groupItems().intoLists().of(100, Duration.ofMillis(1000)) was requesting Integer.MAX_VALUE, after the change it requests the list-size.

Example simulation code:

    @Incoming("events")
    @Outgoing("events-persisted")
    public Multi<Message<Integer>> persistEventsMulti(Multi<Message<Integer>> events) {
        return events
                //.onRequest().invoke(reqSize -> { log.info("Events-Request: "+reqSize); }) // reqSize: 16
                .emitOn(Infrastructure.getDefaultWorkerPool())
                //.onRequest().invoke(reqSize -> { log.info("Events-Request-group-list: "+reqSize); }) // reqSize: 100 - this was: Integer.MAX_VALUE
                .groupItems().intoLists().of(100, Duration.ofMillis(1000))
                //.onRequest().invoke(reqSize -> { log.info("Events-Request-flatmap: "+reqSize); }) // reqSize: 1
                .flatMap(messages -> {
                    persistBatch(messages);
                    return Multi.createFrom().iterable(messages);
                })
                //.onRequest().invoke(reqSize -> { log.info("Events-Request-after-persist: "+reqSize); }) // reqSize: 16
                .emitOn(Infrastructure.getDefaultExecutor())
                ;
    }

    public void persistBatch(List<Message<Integer>> events) {
        Utils.sleep(100);
    }

    @Incoming("events-persisted")
    public CompletionStage<Void> messageAcknowledging(Message<Integer> message){
        return message.ack();
    }

@voortwis
Copy link
Contributor Author

voortwis commented Mar 5, 2021

Update: This changed fixed the pushback problem. Testing this in a more realistic setting (kube, dev environment) returned an exception.
This occurs after first processing 1M+ events. A couple of minutes nothing, then producing (and restarting) processing again.
I will look into this, maybe one of you can give me a hint what to check.

2021-03-05 09:11:11,760 ERROR [io.sma.rea.mes.provider] (executor-thread-18) SRMSG00201: Error caught while processing a message: io.smallrye.mutiny.subscription.BackPressureFailure: Cannot emit item due to lack of requests

@cescoffier
Copy link
Contributor

Very good catch!

MultiWindowOnDurationOp is using request(Long.MAX_VALUE); which in this case is not correct.

Thanks!

@cescoffier cescoffier merged commit 37def3f into smallrye:master Mar 5, 2021
@cescoffier cescoffier added the bug Something isn't working label Mar 5, 2021
@cescoffier
Copy link
Contributor

Damned, didn't see your last comment.

The back pressure failure you see is due to a consumer not consuming fast enough.

@cescoffier
Copy link
Contributor

Actually, pretty sure it comes from io.smallrye.mutiny.subscription.MultiSubscriber#onFailure line 161.

Basically every time window we try to emit the accumulated items. If we don't have requests, we fail, as we can't continue accumulating (that would be an OOM)

@voortwis
Copy link
Contributor Author

voortwis commented Mar 5, 2021

Actually, pretty sure it comes from io.smallrye.mutiny.subscription.MultiSubscriber#onFailure line 161.

That is the only place with that exact exception text. What I see is that my downstream requested 16. Then the MultiBufferWithTimeoutOp is producing these 16 lists, counting down line:144 - most of the time it will jump from 1 to 16. Sometimes it ends at 0. Currently trying to get a better understanding why.

@voortwis
Copy link
Contributor Author

voortwis commented Mar 5, 2021

Exception only occurs when there is no .emitOn(Infrastructure.getDefaultWorkerPool()) before the group into list with size and duration. Maybe one of you can explpain this difference in behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants