-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes performance degradation when fragmentation is used #995
Conversation
Unfortunately, this fix breaks the idea of fragmentation and introduces head-of-the-line blocking since now basically all frames are fragmented and sent one by one meaning, that if one huge frame has to be sent, it will be blocking the others. I believe what you need is another fragmentation level which is given by WebSocket by default. |
@OlegDokuka " that if one huge frame has to be sent, it will be blocking the others." I am not sure I understand. Why it should be blocking others? original code is doing the same. There is only exception that concatMap has condition that when no fragmentation is not needed we do not need to send with "fragmented" branch. |
I did. What you do is basically you have concatMap which sends all the frame in a single, |
delegate.send is blocking others? Then no multiple streams could even work on same channel. |
It sequentially drains frames, one-by-one |
I mean, looking at your fix and the explanation, I believe that you need something different and at the netty level. Can you please look at my comments on #994? |
What about flatMap then? |
@OlegDokuka Why this line is not blocking the same way you say for my version? That concatMap also does one by one and if there is huge data frame then others waits in queue: @Override
public Mono<Void> send(Publisher<ByteBuf> frames) {
return Flux.from(frames).concatMap(this::sendOne).then();
} |
@koldat this code is weird, and it kind of the same and not. The main difference that every sub-flux generated within
Actually, it can be a good alternative and I guess we can land it as an improvement. So, your code is good, lets try to migrate from concatMap to flatMap and see if nothing is broken. |
rsocket-core/src/main/java/io/rsocket/fragmentation/FragmentationDuplexConnection.java
Outdated
Show resolved
Hide resolved
Changed to flatmap. Performance is almost same (5 seconds vs 30 and more without fix). Tests pass (locally) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my private discussion with @rstoyanchev, we figured out that flatMap may break frame ordering so frames for the same stream can be potentially reordered which is something we don't wanna have. Actually, it turned out that the previous impl may do the same, so we need to iterate a little more to ensure we have good perf and do not break frame ordering
@koldat the head-of-line and performance issues with fragmentation are well known limitations in 1.0.x which required significant work that was done for 1.1, if you take a look at #761 and the issues linked to it. Given there are no easy solutions for this will likely remain as a limitation in 1.0.x and you'll need to upgrade to 1.1 to get the benefits of the rework. You mentioned Spring Boot 2.3 which has 3 months remaining as well so you'll need to upgrade to 2.4 which is based on RSocket Java 1.1. As @OlegDokuka mentioned using |
@koldat as it turned out, you were right about behaviors, we kind of exploit a bug of reactor-netty which did shuffling frames, and the bug is fixed now, which says we have a head-of-line-blocking at the moment which is unfortunate. We chatted with @rstoyanchev and figured out that we can use flatMap, however, we need to put groupBy operator beforehand to ensure that flatMap is not reordering frames for the same streamId. I suggest doing the following:
can you please check if that solution is still good enough for you? |
if the above will not work well for you, I guess we can stick to concatMap (instead of flatMap) and just state that we have a head-of-line blocking problem in 1.0.x though we support fragmentation (which is useless in that case) |
I think groupBy is not a good idea as it keeps stream in groupBy operator forever (internally as Map). I would go with concat as you have said it is same performance and least chance to make issue. Scaling this can be easily done by loadbalancing (more connections). But still I do not think it is an issue, because connection is just for the application it uses it so having fully utilized wire cannot be faster. So that having potential way to interleave the streams does not increase the final throughput. @rstoyanchev yes 3 months sounds short, but some deployments does not go that fast. Especially in production. Yes we plan to more forward and do upgrade, but we also want to have current version stable and performance. Regarding the comment on changing the fragmentation. That is actually a problem. Setting any value cause this issue. It does not matter what value it is. Should I switch back to concatMap or you do not want to include it in the release? |
Technically, we can track the end of the stream, so it will not be a problem, and we can cancel the inner Flux when we see the terminal frame for that streamId. I will try to do that for 1.0.5 (if we end up doing 1.0.5)
@koldat yes, please |
Signed-off-by: Tomas Kolda <[email protected]>
Change done |
@koldat thanks for your contribution |
When one define custom mtu to be used for fragment size it significantly degrades performance. Attached example code sends 1M records with size 5 bytes. Before fix it takes 39 seconds. After fix it takes 5 seconds (same time as with no custom fragmentation). We need to enable it, because Websocket max data size is 64kB and we support both transports.
See #994