-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize SimpleSubscriber for Netty #3583
Conversation
val finalArray = ByteBufUtil.getBytes(byteBuf) | ||
byteBuf.release() | ||
resultBlockingQueue.add(Right(finalArray)) | ||
resultPromise.success(finalArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the subscriber enter into some state, where another arriving content would report an exception? it's possible that the incoming header is malformed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a handler for this case - now if the result queue.offer call returns false, we follow up with canceling the subscription - as far as I checked that's the recommended way to 'abort'.
private val resultBlockingQueue = new LinkedBlockingQueue[Either[Throwable, Array[Byte]]]() | ||
private val buffers = new ConcurrentLinkedQueue[ByteBuf]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually ... what are the concurrency guarantees for a subscriber - can multiple onNext
be called concurrently? or maybe onNext
+ onError
? I'm wondering if we (a) need a concurrent data structure here at all and (b) if concurrency is allowed, is the impl safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The onNext
, onError
, and onComplete
operations are guaranteed to be called sequentially without concurrency, so we are safe to replace the concurrent data structure with something simpler. I tried with a ListBuffer and it actually gave another noticeable boost to throughput and latency for PostLongBytes.
Nice results! :) |
import scala.concurrent.{Future, Promise} | ||
|
||
private[netty] class SimpleSubscriber(contentLength: Option[Int]) extends PromisingSubscriber[Array[Byte], HttpContent] { | ||
private var subscription: Subscription = _ | ||
private val resultPromise = Promise[Array[Byte]]() | ||
private var totalLength = 0 | ||
private val resultBlockingQueue = new LinkedBlockingQueue[Either[Throwable, Array[Byte]]]() | ||
private val buffers = new ConcurrentLinkedQueue[ByteBuf]() | ||
private val buffers = new mutable.ListBuffer[ByteBuf]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up question (sorry ;) ) - onNext/onComplete/onError are guaranteed to be called from one thread, but is it going to be the same thread? that is, does buffers
need to be volatile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have such a guarantee, and maybe that was why I had a ConcurrentLinkedQueue
for byte arrays in the previous implementation, but I forgot :) This means I either fall back to it, or use a volatile ListBuffer
.
I guess var totalLength
is also unsafe and should be converted to an AtomicInteger
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: You can't have a volatile val
, so I fell back to ConcurrentLinkedQueue
. For totalLength
I chose @volatile
instead of AtomicInteger
, because we are using this variable sequentially, our only scenario is increasing it in onNext
and reading in onComplete
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we can have a volatile var with an immutable list - always less synchronisations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I replaced the ConcurrentLinkedQueue
with a volatile var ListBuffer
. In theory, ListBuffer
should have slightly cheaper append time than a Vector
, and we don't need Vector's fast random access, which takes additional cost of maintaining more complex underlying structure.
I don't see any improvement of throughput, but there's a slight improvement in latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this properly protects the value - ListBuffer
is mutable, and the volatile
only ensures there's a memory barrier before reading the buffers
reference (not references inside the ListBuffer
). But maybe since we have a memory barrier, everything will be synchronized correctly ... as you can't really access the inner references before first reading the buffers
reference (which creates the barrier).
Anyway, I though about a simpler design, using immutable data structures, where you don't have to think that much ;-) But maybe this one works as well :)
Fixes #3548
This PR updates the
SimpleSubscriber
with following improvements:ByteBufs
without rewriting them into arrays, then copy them into the final array inonComplete
. Disclaimer: initially I wanted to cover this case with Netty'sCompositeByteBuf
, but it turned out to be bad idea. It does some reallocations to resize internal representation underneath, resulting in making the overall performance significantly worse.Results:
PostBytes
SimulationLatency improvement:
PostLongBytes
SimulationLatency improvement:
I haven't measured it, but there should be also noteworthy gains in memory allocations.