-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Http4sWebSockets. #3393
Conversation
ignorePing
to WebSocketBodyOutput.ignorePing
to WebSocketBodyOutput. WIP
ignorePing
to WebSocketBodyOutput. WIP
For comparison here is a flame graph for http4s with blaze only (no tapir) under the same load: @kciesielski @adamw Is there anything left that could defer merging? |
@plokhotnyuk the PR was in draft mode, didn't notice it's finalised now. Will take a look on Monday probably :) |
pipe: Pipe[F, REQ, RESP], | ||
o: WebSocketBodyOutput[Pipe[F, REQ, RESP], REQ, RESP, _, Fs2Streams[F]] | ||
): F[Pipe[F, Http4sWebSocketFrame, Http4sWebSocketFrame]] = { | ||
if ((!o.concatenateFragmentedFrames) && (!o.ignorePong) && (!o.autoPongOnPing) && o.autoPing.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what's the real difference between the fast & slow tracks - if the flags above are set to appropriate values, apart from some additional initialisation code, the fs2 pipeline seems to end up the same in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second question: why these particular flag combination has been chosen for the fast track? The default is to have auto pings (which I think is reasonable), auto pongs, ignore pongs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between the fast and slow track is that a fast track merely pipes the input stream through the business logic, whereas "slow" track involves a heavier machinery of concurrently merging a) business logic output b) autoPing and c) autoPongOnPing streams. This also answers the second question - if we want to avoid concurrent merging of streams for performance reasons, a certain flag combination is enforced, unfortunately. That said, looking at the flags in the fast track:
concatenateFragmentedFrames
is currently a no-op,ignorePong
andautoPongOnPing
can be handled on thehttp4s
level. (autoPongOnPing
is a bit of an unfortunate flag as it conflates two distinct behaviors - auto pong AND filtering ping. We cosidered addingWebSocketBodyOutput.ignorePing
but ditched the idea due to binary compatibility issues. )- admittedly, the one remaing is
autoPing == false
, which has to be handled somewhere else (server layer or busines layer)
Also, even the "slow" track is quite performant compared to the status quo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply - so in the "slow path" we are allocating a Channel
to do the concurrent merging.
We are using the channel for two purposes: auto-pong and auto-ping. The other flags simply modify the input / output streams, leaving them unchanged if they are false
. So maybe we could relax the condition for the fast path, so that it's taken when auto-pong=false & auto-ping=false? I think we should be able to do the concatenating / close-decoding transformations on the fast path as well?
Secondly, I think it would be great to add some documentation, explaining how to best use http4s websockets. As I understand, http4s has also a built-in auto-ping mechanism, which should preferrably be used - that should end up in the docs (a similar note is present in the akka docs, btw.). Similarly, information about the slow/fast paths, what is the performance difference would be great to have, so that people could make an informed choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revise, cannot promise before Christmas, though :)
Done with the code part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regrettably, http4s
lacks the auto-ping mechanism and doing it on the Stream (tapir) level turns out to be a (unbelievably) costly operation. Here is why: merging a stream with an empty(!) stream reduces the speed by x800
:
val n = 10_000
val m = 100
val s = Stream(1).covary[IO].repeatN(n)
val s2 = s.mergeHaltL(Stream.empty)
s.compile.last.replicateA_(m).timed.map(_._1.toMillis).flatMap(IO.println) >>
s2.compile.last.replicateA_(m).timed.map(_._1.toMillis).flatMap(IO.println)
130
105394
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regrettably, http4s lacks the auto-ping mechanism and doing it on the Stream (tapir) level turns out to be a (unbelievably) costly operation. Here is why: merging a stream with an empty(!) stream reduces the speed by x800:
That looks ... really bad. To be honest I did not suspect that usage of fs2 might dominate I/O, but I guess that's what we're looking at here.
Unless we're using fs2 in a totally wrong way ...
…pdated stream merging logic.
Updated tests (increased number of connections from 10k to 25k). (results for current tapir not included as it failed to finish the test for perf reasons) Optimizes
Performance resultshttp4s + blaze:
after standard_path :
after fast_path :
Histogramstapir_after_fast_track vs plain http4s+blaze: tapir_after_fast_track vs tapir_after_slow_track vs plain http4s+blaze: CPU pressureNumber of async-profiler samples:
Benchmark repo:https://github.com/kamilkloch/websocket-benchmark/tree/master/results |
Great, thanks! :) I can merge & release this now, and we can do the documentation / comments (so that future generation - that is us in a month - will know why the |
I think it a good idea. By that time we might perhaps land an auto-ping in the http4s/blaze layer, and then we will be abel to add more comprehensive docs, or even change tapir defaults. |
Ok, a release is on its way. Please don't forget about the docs ;) I'll be waiting ;) |
I have not forgotten, waiting until we land a |
Fixes #3397.
Optimizes
Http4sWebSockets.pipeToBody
twofold:concatenateFragmentedFrames
,ignorePong
,autoPongOnPing
,autoPing
flags inWebSocketBodyOutput
are allfalse
, liftHttp4sWebSocketFrames
intoREQ
, run through pipe, convertRESP
back toHttp4sWebSocketFrame
, use fs2 Chunks for mapping output.autoPings
,autoPongOnPing
. Use fs2.Channel to perform the merge (more efficient thanStream#mergeHaltL
/Stream#parJoin
.Performance results
before:
after standard_path :
after fast_path :
Histograms
tapir_before vs tapir_after:
tapir_after vs plain http4s+blaze:
CPU pressure
Number of async-profiler samples:
2M -> 400K means 5x reduction of CPU cycles.
before:
after standard_path:
after fast_path:
Benchmark repo:
https://github.com/kamilkloch/websocket-benchmark/tree/master/results