Publisher.toStream can demand for more than one element #2666

yanns · 2021-10-05T18:37:57Z

Context: https://discord.com/channels/632277896739946517/632310980449402880/894663878133317722
Asking for more than one element allows a much better performance.
For example, when subscribing to a mongo publisher, we can use the opened cursor in much less queries.

The change was tried with the following app: https://github.com/yanns/mongo-fs2/

Context: https://discord.com/channels/632277896739946517/632310980449402880/894663878133317722 Asking for more than one element allows a much better performance. For example, when subscribing to a mongo publisher, we can use the opened cursor in much less queries.

yanns · 2021-10-05T19:40:47Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala

  def apply[F[_]: Async, A]: F[StreamSubscriber[F, A]] =
-    fsm[F, A].map(new StreamSubscriber(_))
+    apply(bufferSize = 10L)


This changes the previous behavior asking elements one by one.
It's also a magic number coming from a quick thinking about the trade-offs memory vs performance.
Should we handle that differently? Using 1L to keep the previous behavior? Using some configuration?

What's the behavior if requesting 10 but there's only 5 available?

This does feel a bit too magical IMO -- I lean towards setting it to 1 or an arbitrary power of 2 :)

It only 5 are available, then we build a chunk of 5 elements, and it's being send in the OnComplete step: https://github.com/typelevel/fs2/pull/2666/files#diff-493b2533a887032dfbf8ea09dca7201413e124b5f45732ad6a877b8d7f69ba2cR178-R180

By setting it to 1 by default can lead to bad performances by default. I'd suggest to take an arbitrary number, not too high to avoid buffering too many elements in memory. 16?

I would strongly recommend not changing existing behavior in source-compatible ways

Yep agreed, let's stick with 1 here as that's what every has used for years.

good point.
I see 2 possible options:

option 1: we set the default buffer size to 1. No surprise for existing users. Maybe surprise for new users who are struggling with bad performances.

option 2: we change the API to emit a compilation warning (see Publisher.toStream can demand for more than one element #2666 (comment)). Existing users will have a compilation warning, grabbing their attention on this particular change.

2.a) with compilation warning + keep existing behavior with a buffer size of 1

2.b) with compilation warning + using a buffer size of 16

My 2 cents from my experience as new user: I'd expect FS2 to be fast by default, and not slow by default unless I understand deeply the API and change it.
I'm not the maintainer of this lib. I let you decide.

1 versus 10 versus 16 could make a performance difference in certain types of streams but not others, where folks would need 100 or 1000 or more. Hence, I think it's best if we make no assumptions at all. So let's go with deprecating the old API and forcing the buffer size decision on to users.

4ad2bbb
to keep the existing behavior + deprecations

diesalbla · 2021-10-05T22:25:44Z

@yanns Thanks for the Pull Request! If the current integration with reactive-streams is forcing consumers to go item-by-item, this should help performance.

One small question, that may have been discussed but missed: how does the use of Chunk in the signatures help? After all, streams are always chunked, but this is often kept as an representation. Most of the code in fs2 keeps the types of streams as the elements A, and the use of streams of chunks is for connecting with (internal) topics, queues, or channels.

mpilquist · 2021-10-06T01:59:35Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala

@@ -69,15 +69,19 @@ final class StreamSubscriber[F[_], A](
    sub.onError(t)
  }

-  def stream(subscribe: F[Unit]): Stream[F, A] = sub.stream(subscribe)
+  def streamChunk(subscribe: F[Unit]): Stream[F, Chunk[A]] = sub.stream(subscribe)
+  def stream(subscribe: F[Unit]): Stream[F, A] = streamChunk(subscribe).unchunks


Echoing @diesalbla's comment here -- I think we should offer just the stream(...): Stream[F, A] method and let folks manually call .chunks if they want to operate on chunks explicitly.

mpilquist

Looks great -- some minor comments

mpilquist · 2021-10-06T02:00:22Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala


  private def nonNull[B](b: B): Unit = if (b == null) throw new NullPointerException()
 }

 object StreamSubscriber {

+  def apply[F[_]: Async, A](bufferSize: Long): F[StreamSubscriber[F, A]] =


The bufferSize should be an Int here as Chunk is Int based.

mpilquist · 2021-10-06T02:01:24Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala

  def apply[F[_]: Async, A]: F[StreamSubscriber[F, A]] =
-    fsm[F, A].map(new StreamSubscriber(_))
+    apply(bufferSize = 10L)


What's the behavior if requesting 10 but there's only 5 available?

This does feel a bit too magical IMO -- I lean towards setting it to 1 or an arbitrary power of 2 :)

mpilquist · 2021-10-06T02:02:30Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala

-    def stream(subscribe: F[Unit])(implicit ev: ApplicativeError[F, Throwable]): Stream[F, A] =
+    def stream(
+        subscribe: F[Unit]
+    )(implicit ev: ApplicativeError[F, Throwable]): Stream[F, Chunk[A]] =


Change return here to Stream[F, A]

mpilquist · 2021-10-06T02:03:06Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala

@@ -126,9 +134,10 @@ object StreamSubscriber {

    sealed trait State
    case object Uninitialized extends State
-    case class Idle(sub: Subscription) extends State
+    case class Idle(sub: Subscription, buffer: Vector[A]) extends State


Change Vector[A] to Chunk[A] and use ++ to build up the buffer.

mpilquist · 2021-10-06T02:03:24Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/StreamSubscriber.scala

    case class RequestBeforeSubscription(req: Out => Unit) extends State
-    case class WaitingOnUpstream(sub: Subscription, elementRequest: Out => Unit) extends State
+    case class WaitingOnUpstream(sub: Subscription, buffer: Vector[A], elementRequest: Out => Unit)


Same here, change buffer to Chunk[A]

yanns · 2021-10-06T06:18:13Z

One question about backwards compatibility: for the moment, I've went with duplicating API like:

    // new API
    def toStream[F[_]: Async](bufferSize: Int): Stream[F, A] =
      fromPublisher(publisher, bufferSize)

    // current API
    def toStream[F[_]: Async]: Stream[F, A] =
      fromPublisher(publisher)

I could also go with extending the current API, and using default values like:

    def toStream[F[_]: Async](bufferSize: Int = StreamSubscriber.DefaultBufferSize): Stream[F, A] =
      fromPublisher(publisher, bufferSize)

Calling .toStream as now would work, but would emit a warning as one should use it like .toStream().
On the plus side, it would remove some code.
What are your thoughts about this?

notxcain

🚀

diesalbla · 2021-10-06T14:14:05Z

@yanns That solution would be mostly source-compatible. However, we are also keen on binary compatibility, which is to say, the contents of the compiled bytecode across versions. The current version:

    // current API
    def toStream[F[_]: Async]: Stream[F, A] =
      fromPublisher(publisher)

In the bytecode, we have a method toStream with one parameter, of type cats.effect.Async. Instead, a Scala method with a default parameter:

    def toStream[F[_]: Async](bufferSize: Int = StreamSubscriber.DefaultBufferSize): Stream[F, A] =
      fromPublisher(publisher, bufferSize)

is compiled in the bytecode to a single method with two parameters, not to two methods with or without the parameter. Defaults values are inserted, by the compiler, in the "caller" bytecode, as arguments for the two-parameter method. However, the existing method would then be missing in the new bytecode version. As a consequence, "callers" to that method, when linked to the new bytecode, would fail.

Thus, I would recommend keeping the existing method, but add to it a deprecation notice.

Deprecate API where the buffer size is not given.

notxcain · 2021-10-06T14:33:11Z

reactive-streams/src/main/scala/fs2/interop/reactivestreams/package.scala

@@ -57,6 +67,11 @@ package object reactivestreams {
  implicit final class PublisherOps[A](val publisher: Publisher[A]) extends AnyVal {

    /** Creates a lazy stream from an `org.reactivestreams.Publisher` */
+    def toStream[F[_]: Async](bufferSize: Int): Stream[F, A] =


@yanns we probably need some docs here explaining how buffer size affects performance and what value to choose.

What do you think of 357dbc0 ?

yanns · 2021-10-06T17:14:35Z

Can someone approve the workflow to check if all the tests pass?

yanns · 2021-10-06T17:56:31Z

We now have a breaking change:

[error] fs2-reactive-streams: Failed binary compatibility check against co.fs2:fs2-reactive-streams_2.12:3.1.4! Found 1 potential problems
[error]  * extension method toStream$extension(org.reactivestreams.Publisher,cats.effect.kernel.Async)fs2.Stream in object fs2.interop.reactivestreams.package#PublisherOps does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("fs2.interop.reactivestreams.package#PublisherOps.toStream$extension")

yanns · 2021-10-06T18:03:34Z

Can someone check if it's ok to exclude this: e618c8d ?
(and trigger the workflow again, thanks!)

yanns · 2021-10-07T06:51:43Z

@mpilquist I think all the points you mentioned have been addressed. Can you check again when you can?

mpilquist · 2021-10-07T10:00:47Z

Are we sure that the mima exclusion can safely be suppressed? It doesn't look so to me.

mpilquist · 2021-10-07T10:23:16Z

I confirmed this change is not binary compatible.

Compiling this program against 3.1.4 and then running against this PR results in the following linkage error:

package test

import fs2._, fs2.interop.reactivestreams._
import cats.effect._

object App extends IOApp.Simple {
  def run = {
    val upstream: Stream[IO, Int] = Stream(1, 2, 3).covary[IO]
    val publisher: Resource[IO, StreamUnicastPublisher[IO, Int]] = upstream.toUnicastPublisher
    val downstream: Stream[IO, Int] = Stream.resource(publisher).flatMap(_.toStream[IO])
    downstream.foreach(IO.println).compile.drain
  }
}

➜  ./target/universal/stage/bin/fs2test
java.lang.NoSuchMethodError: 'fs2.Stream fs2.interop.reactivestreams.package$PublisherOps$.toStream$extension(org.reactivestreams.Publisher, cats.effect.kernel.Async)'
	at test.App$.$anonfun$run$1(app.scala:10)
	at fs2.Stream.$anonfun$flatMap$1(Stream.scala:1152)
	at fs2.Pull$FlatMapR$1.go$2(Pull.scala:1000)
	at fs2.Pull$FlatMapR$1.unconsed(Pull.scala:1008)
	at fs2.Pull$FlatMapR$1.out(Pull.scala:1017)
	at fs2.Pull$ViewRunner$1.outLoop$1(Pull.scala:928)
	at fs2.Pull$ViewRunner$1.out(Pull.scala:930)
	at fs2.Pull$.$anonfun$compile$24(Pull.scala:1175)
	at fs2.Pull$.$anonfun$compile$2(Pull.scala:903)
	at cats.effect.IOFiber.next$2(IOFiber.scala:361)
	at cats.effect.IOFiber.runLoop(IOFiber.scala:372)
	at cats.effect.IOFiber.execR(IOFiber.scala:1169)
	at cats.effect.IOFiber.run(IOFiber.scala:128)
	at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:410)
Exception in thread "main" java.lang.NoSuchMethodError: 'fs2.Stream fs2.interop.reactivestreams.package$PublisherOps$.toStream$extension(org.reactivestreams.Publisher, cats.effect.kernel.Async)'
	at test.App$.$anonfun$run$1(app.scala:10)
	at fs2.Stream.$anonfun$flatMap$1(Stream.scala:1152)
	at fs2.Pull$FlatMapR$1.go$2(Pull.scala:1000)
	at fs2.Pull$FlatMapR$1.unconsed(Pull.scala:1008)
	at fs2.Pull$FlatMapR$1.out(Pull.scala:1017)
	at fs2.Pull$ViewRunner$1.outLoop$1(Pull.scala:928)
	at fs2.Pull$ViewRunner$1.out(Pull.scala:930)
	at fs2.Pull$.$anonfun$compile$24(Pull.scala:1175)
	at fs2.Pull$.$anonfun$compile$2(Pull.scala:903)
	at cats.effect.IOFiber.next$2(IOFiber.scala:361)
	at cats.effect.IOFiber.runLoop(IOFiber.scala:372)
	at cats.effect.IOFiber.execR(IOFiber.scala:1169)
	at cats.effect.IOFiber.run(IOFiber.scala:128)
	at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:410)

This reverts commit e618c8d.

yanns · 2021-10-07T13:17:48Z

@mpilquist Thanks for having checked! I don't really understand why adding a new method with a same name breaks the binary compatibility. Is someone understands, I'd be glad to learn about it.

I've removed the mima exclusion: b6e1847
And I've renamed the new API: 30df37d
mimaReportBinaryIssues is happy about this.

If you have better suggestions, please feel free to comment.

AL333Z · 2021-10-07T13:33:41Z

Definitely not a binary compatibility expert here, but the diffs in the bytecode https://editor.mergely.com/Pq7KTAXv/ ~~are~~ were showing a new bufferSize arg (around line 30 and following).

yanns · 2021-10-07T18:34:36Z

Can someone trigger the workflow?

yanns · 2021-10-08T15:16:08Z

@mpilquist the binary issue is now fixed. Can you review again?

yanns · 2023-02-11T19:00:05Z

I've made a little talk about this PR: https://yanns.github.io/blog/2021/12/21/fs2-mongo-reactivestreams/

yanns force-pushed the buffer_subscriber branch from 605b55a to 301a575 Compare October 5, 2021 18:49

yanns added 2 commits October 5, 2021 21:09

fix ordering

fea867f

rename extension to avoid backwards compatibility issues

f00cf21

yanns commented Oct 5, 2021

View reviewed changes

mpilquist reviewed Oct 6, 2021

View reviewed changes

mpilquist requested changes Oct 6, 2021

View reviewed changes

yanns added 3 commits October 6, 2021 07:58

bufferSize is a Int as Chunk size are Int

f112b4c

internally use Chunk instead of Vector

769b2fe

do not expose Chunk

d99c684

yanns added 2 commits October 6, 2021 08:22

use a default buffer of 16 elements

5de1aa4

avoid diff with 'main' branch

5d682c4

notxcain approved these changes Oct 6, 2021

View reviewed changes

keep existing behavior asking for elements one by one

4ad2bbb

Deprecate API where the buffer size is not given.

notxcain reviewed Oct 6, 2021

View reviewed changes

document bufferSize

357dbc0

this change should be binary compatible

e618c8d

document how to keep the existing behavior

6b0788d

yanns added 2 commits October 7, 2021 15:06

Revert "this change should be binary compatible"

b6e1847

This reverts commit e618c8d.

rename new API toStreamBuffered to avoid breaking binary compatibility

30df37d

mpilquist merged commit eb39fe4 into typelevel:main Oct 9, 2021

yanns deleted the buffer_subscriber branch October 9, 2021 12:44

Publisher.toStream can demand for more than one element #2666

Publisher.toStream can demand for more than one element #2666

Conversation

yanns commented Oct 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diesalbla commented Oct 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpilquist left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanns commented Oct 6, 2021

notxcain left a comment

Choose a reason for hiding this comment

diesalbla commented Oct 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanns commented Oct 6, 2021

yanns commented Oct 6, 2021

yanns commented Oct 6, 2021

yanns commented Oct 7, 2021

mpilquist commented Oct 7, 2021

mpilquist commented Oct 7, 2021

yanns commented Oct 7, 2021

AL333Z commented Oct 7, 2021 • edited Loading

yanns commented Oct 7, 2021

yanns commented Oct 8, 2021

yanns commented Feb 11, 2023

yanns commented Oct 5, 2021 •

edited

Loading

AL333Z commented Oct 7, 2021 •

edited

Loading