-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backpressure story #71
Comments
My opinion about back pressure is that you either have a truly reactive system where the producer is fully in charge and the onus is on the consumer to keep up; or a truly interactive the consumer is fully in charge and the onus is on the producer to keep up. This is one of the cases where having different types of collections makes sense (just like we have different collections like lists, maps, sets, ... with different performance characteristics and different operations). @benjchristensen has a different opinion on that. |
Any system that has hidden unbounded buffers is a recipe for disaster. RxJava was rightfully dismissed outright by many because it would consume an entire stream into unbounded buffers in Just because some use cases should be "completely reactive" (such as mouse events), it does not mean that signals are not needed for backpressure. Even with mouse events, if the consumer is slow, an unbounded buffer in In other words, an application should not silently drop or do unbounded buffering data. It should always be explicit to do either of those things. So even for the mouse event use case, the backpressure mechanism becomes a signal for flow control to kick in (conditionally drop, buffer, debounce, throttle, etc only when backpressure happens) or errors out and tells the developer to add flow control either conditionally (i.e. onBackpressure*) or permanently (temporal operators like debounce, sample, throttle, etc). If the "reactive collection" doesn't have unbounded buffers, them I'm fine with it existing separately, but that means it can't have any async operators like The signals can be ignored. And |
I suggest that the collaboration for Reactive Streams hardened the API, interaction model, and contracts quite well. It would look something like this in Javascript: interface Observable {
subscribe(o: Observer<T>) : void; // or could return Subscription
lift(operator): Observable<R>;
}
interface Subscription {
unsubscribe(): void;
isUnsubscribed(): boolean;
request(n: number): void // optional if backpressure is added
}
interface Observer<T> {
onSubscribe(s: Subscription): void;
onNext(t: T): void;
onError(e: any): void;
onComplete(): void;
} The consumer then can either invoke If the |
People have been using If you are doing a Also note that anytime you use say a http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Semaphore.html or a http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html you are using a hidden unbounded buffer. Saying "unbounded buffer are verboten" is as fundamentalist as Haskell people saying all side-effects are bad, or the Akka folks saying everything is an actor with strict asynchronous boundaries, or Java insisting on checked exceptions. Programming is tastefully composing side effect, to write real stuff you have to break the rules, so you never say never, or always say always. |
I know. That's why RxJava never holds a lock when emitting, and never uses Same issue with
Using synchronization, and call-stack blocking to avoid
This is not a fundamentalist approach. In fact, I'd argue that you are taking the fundamentalist approach claiming "reactive" requires unbounded buffers. I'm arguing against the "purity" of pure-push, and allowing the type to support moving between push and pull depending on the speed of the producer/consumer relationship. It is embracing the messiness of our applications and abstractions. I do not say that "unbounded buffers are verboten", I say that hidden "unbounded buffers are verboten". If someone wants to say
It is not correct to say "no problems". People work around them because they haven't been given tools to do otherwise. Naively consume a Websocket stream for example using RxJS in a browser. I myself have had these problems. Ben Lesh linked to someone else having similar issues in another environment RxJS will be used: Node. We can state that an |
I don't think you guys fundamentally disagree, but as you can see from the quotes, we just need to find a way how to determine consumer behavior when producer is too fast. Implicit ("hidden") has gotchas for developers. Explicit definition of lossy or lossless consumption should be how the consumer handles its onus. |
@benjchristensen could you give us a few use-cases for backpressure we'll run into at Netflix? (since that's part of the reason I'm proposing this) and I know that you an @jhusain disagree on the need for backpressure in RxJS. |
Agreed, and I suggest that this is exactly what @headinthebox and I spent 9+ months with many in the RxJava community figuring out with "backpressure" which behaves as an upstream signal from the consumer to the producer. Hence my advocating for it here. It's something we have implemented and proven in production. It's not just a theory. Here is more information on it:
|
@benjchristensen can we get some use cases specifically to help make sure we're making good decisions? I mean, it seems like one could do a something like this primitive example to deal with backpressure to some degree: var requestor = new BehaviorSubject(true);
source.buffer(requestor)
.flatMap(buffer => Observable.from(buffer)
.finally(() => requestor.next(true)))
.subscribe(x => console.log(x)); |
I'm not for or against the idea (I created the issue, afterall), but I want it considered up front. |
@Blesh yes, I can do that a little later tonight |
@headinthebox going back to your point about having multiple collection types, I'd like to explore that idea with you. Let's say we had something like this:
interface Flowable {
subscribe(o: FlowObserver<T>) : void; // or could return Subscription
lift(operator): Flowable<R>;
}
interface FlowableSubscription {
unsubscribe(): void;
isUnsubscribed(): boolean;
request(n: number): void // optional if backpressure is added
}
interface FlowObserver<T> {
onSubscribe(s: Subscription): void;
onNext(t: T): void;
onError(e: any): void;
onComplete(): void;
} Then Observable as the original and simpler: interface Observable {
subscribe(o: Observer<T>) : void; // or could return Subscription
lift(operator): Observable<R>;
}
interface Subscription {
unsubscribe(): void;
isUnsubscribed(): boolean;
add(s: Subscription): void
remove(s: Subscription): void
}
interface Observer<T> extends Subscription {
onNext(t: T): void;
onError(e: any): void;
onComplete(): void;
} Then on A Is this what you're thinking instead of having one type support both use cases? |
Use cases for backpressure:
This is a dumping of thoughts, some are more important than others, and all definitely don't apply to all environments (desktop browser, mobile browser, mobile app, Node.js server), but they are representative of the type of stuff we will deal with as we embrace more stream oriented programming and IO. |
Okay, it doesn't seem like the use cases you're mentioning are at all edge cases. In fact, some seem fairly common at least in server-land where you want to maximize data throughput. So I think what we need to weigh here is:
I'm not too worried about cost in terms of "size" of the library, because there are plenty of ways to get around that. |
@Blesh people are pretty satisfied with the current backpressure mechanisms with |
Let me state upfront that I am not against backpressure, not at all. All the use case above make total sense. What I am against is the way backpressure is implemented in RxJava/Reactive Streams where every operator has to know about it. That is clowny and inefficient. Just like the selector function of To achieve backpressure you wire up things like a classical feedback system https://en.wikipedia.org/wiki/PID_controller. There the plant/process, consisting of Rx operators, just transforms input to output streams. It knows absolutely nothing about backpressure. That is all done by the feedback loop where at one point you measure the value of a setpoint (which has no "knowledge" of the plant/process) and use that to throttle the input source, which does know how to handle backpressure (and thus probably has an interface like Note that in the current ReactiveStream model, you still need such a feedback loop + a controller to know how much to request. If you ever request > 1, you run the risk that the consumer gets overwhelmed. |
I find myself agreeing with the sentiment of both @headinthebox and @benjchristensen here. While I am unfamiliar with The PID controller design, I think this argument echos what we have found with the limits of where Rx is suitable. Rx is great for composing many sequences of events. Rx is not great at workflow. Rx is not great at parallel processing. The concern for hidden unbounded buffers seems to more suitably addressed by introducing explicit buffers, however I dont think these belong in the Rx query. Keep Rx a single direction event query API. |
@headinthebox That is how how it's implemented in RxJava. Note that map (https://github.com/ReactiveX/RxJava/blob/1.x/src/main/java/rx/internal/operators/OperatorMap.java) does not know anything about backpressure. Filter has to know a little (https://github.com/ReactiveX/RxJava/blob/1.x/src/main/java/rx/internal/operators/OperatorFilter.java#L57) because it affects the stream (dropping data), but it is trivial. Data flows through them "as fast as they can".
I understand this argument as we've discussed it during the RxJava backpressure design last year. However, it does not work well in practice because hidden unbounded queues in The end result is people choose (generally rightfully) to not use Rx. This was a massive push-back I received for server-side use of Rx in Java, and the same will apply to the Node.js ecosystem (which I am now involved in building for).
The feedback loop is handled automatically within the operator implementations, generally by the internal bounded queue depth. And this only applies to async operators that have queues, such as @mattpodwysocki The The ReactiveStreams/RxJava model has been proven to compose and work well in large (distributed Netflix API and stream processing) and small (Android) systems, enabling it to be built without hidden unbounded buffers, and working over async subscription boundaries, such as networks. |
@LeeCampbell thanks for getting involved!
This effectively means RxJava v1 should not exist as designed and that most of the use cases I currently use Rx for should stop using Rx. Nor will I be able to use Rx for the stream-oriented IO I'm working on between Java and Node.js. The reason is that Rx without backpressure support can't easily compose with the needed "push/pull" semantics, so I'll end up instead using RxJava/ReactiveStream APIs and exposing those, and then for composition reasons, adding operators to that library and end up with a competitor to Rx whose only difference is that I support the upstream |
@LeeCampbell My original vision (which has not changed) for Rx was as a single direction event query API (it used to be called "LINQ to Events") where the producer determines the speed. For (async) pull, where the consumer determines the speed, we shipped @benjchristensen I was just using Adding backpressure to RxJava has yielded in a very long bug tail that is still dragging on. The implementation of By factoring out back pressure to a special backpressure sensitive source that can be sped up or slowed down controlled by a feedback loop, you factor out all that complexity into a single place. And, if you don't use it, you don't pay. |
If we were to merge the |
A 30LOC implementation in Java would have to assume use of I will stop arguing this point. But without composable backpressure in RxJS I will not be able to use it for the Node.js apps I am building so will end up building a competitor. |
Doesn't node.js already have a stream API https://nodejs.org/docs/latest/api/stream.html? |
I would also like back pressure control compose through. @benjchristensen setting aside operator complexity, is it possible to quantify the runtime cost when it isn't used? @Blesh it doesn't seem like something you can solve with lift since it affects operator behavior. If I understand correctly, you'd have to override every operator on a BackpressureObservable to respect back pressure semantics.
|
That's exactly right. I was just trying to think of a compromise. |
At this point, I'm generally for an implementation with a solid backpressure story. As long as the impact on performance isn't grotesque. I think there are plenty of things we can do to make sure the queuing internally is highly performant, such as leverage libraries like @petkaantonov's deque, which we should honestly be investigating regardless. Right now we have unbounded arrays in operators like zip, and there are definitely plenty of real-world use cases where that becomes problematic. Also, I'm not a big fan of Iterator of Promise, or Promises in general, so I find the idea that I'd have to use something like that for backpressure control across network boundaries really gross. I want to set up a separate issue to discuss joining the Subscription and Observer as a single type (with separate interfaces), as discussed above. I think that's a related issue that needs flushed out. |
Yes, it was painful retrofitting it into existing code. We had 50+ operators all implemented over 18+ months without it and have had to spend the long effort to go back and rewrite them. We also had to make complicated concessions to design to add backpressure without breaking the existing APIs. Starting from scratch is a very different thing, especially after having done it once.
Yes it does. But from what I can tell it target byte streams, like File IO, not Objects. Nor is it as flexible as Rx from what I can tell looking at it's API.
In JS I can't give useful information unless we start building and compare. In Java we of course are concerned with performance as well since we execute 10s of billions of However, it has also allowed us to do optimizations that were unavailable before. For example, we always use pooled ring buffers now, instead of requiring unbounded buffers that can grow. So our object allocation and GC behavior is actually better. This took effort, and was important to achieve good performance, but was an option we didn't even have previously since we had to support unbounded growth. I suggest two approaches to move forward:
|
A note on |
Nobody can, really. It'll obviously have some impact, but it's hard to say what. I'd like to make some decisions about #75 before implementing this, because I think it could drastically change the implementation. I could, of course, be totally wrong about that. |
Here are some concrete numbers of throughput for These results show ops/second. I annotated the first few to show how many onNext/second it translates into.
Note of course that these async cases are demonstrating synchronization across multiple threads, which would not exist in Javascript. |
Ok. ;-) |
@Blesh @trxcllnt and I spent some time at a whiteboard ... since JS is a single-threaded environment, we likely can have very good fast-path options for performance, even if we have a single The options I see right now after that conversation are: 1) Different types
Converting between them would look something like this: observable.toCleverName(Strategy.DROP)
observable.toCleverName(Strategy.BUFFER)
observable.toCleverName(Strategy.THROTTLE, 5_SECONDS) // pseudo code obviously
clearName.toObservable() // no strategies needed since it will just subscribe with Number.MAX_VALUE 2) Observable with RxJava v2 (Reactive Stream) semanticsThis has an interface Observer<T> {
onSubscribe(s: Subscription): void;
onNext(t: T): void;
onError(e: any): void;
onComplete(): void;
} This is a cleaner type model, but means a firehose Observable.create(o => {
var s = new Subscription();
o.onSubscribe(s);
var i=0;
while(!s.isUnsubscribed) {
o.onNext(i++)
}
}) The backpressure capable case would look like this: Observable.create(o => {
var requested=0;
var i=0;
o.onSubscribe(new Subscription(n => {
if((requested+=n) == n) {
while(!isUnsubscribed) {
o.onNext(i++)
}
}
}));
}) 3) Observable with RxJava v1interface Observer<T> extends Subscription {
setProducer(s: Producer): void;
onNext(t: T): void;
onError(e: any): void;
onComplete(): void;
} This is more complicated than option 2 as it involves both Observable.create(s => {
var i=0;
while(!s.isUnsubscribed) {
s.onNext(i++)
}
}) The backpressure capable case would look like this: Observable.create(s => {
var requested=0;
var i=0;
o.setProducer(new Producer(n => {
if((requested+=n) == n) {
while(!s.isUnsubscribed) {
s.onNext(i++)
}
}
}));
}) Next Steps?I suggest we start building option 2 and let performance numbers prove out whether we should end up with a single type or two types, and then pursue the bikeshedding on naming the thing. I agree with @Blesh that if we do option 1 that we should have both types in the same RxJS library so the community, mind-share, collaboration, interoperability, etc all coexist. They will however be modularized so the artifacts can be published and depended upon independently. Anyone have a recommendation for a better way forward? |
@benjchristensen, per our in-person discussion with @trxcllnt, I'd like people to better understand the implications backpressure has on For example, we can fast-path I'm not saying this is good or bad, I'm saying it's the nature of real backpressure control that actually composes. |
Also, some "Netflixy" context: This type will exist somewhere@benjchristensen's team has a very real need for an Observable type with backpressure. As such, this type WILL exist. And it can exist in one of three ways:
Different strokes for different folks within Netflix@trxcllnt and @jhusain have need for an Observable type that has very short function call stacks and is very, very fast. This is because they need an Observable that will operate in non-JIT'ed environments on very weak hardware (That cheap smart TV you bought three years ago, for example). ... so this decision will not be met with any small amount of debate here at Netflix. I'm really hoping the larger community of GitHub comes in handy with thoughtful comments and concerns on both sides. |
Speaking more with @jhusain, I'm fine with RxJS not including backpressure for now. I will use alternate APIs (i.e. Reactive Streams) at the network protocol layer that expose backpressure over the network and then bridge into RxJS which will request 'infinite'. The primary use cases right now in JS are tackling small enough datasets that they can be swallowed into buffers. Focusing on a highly performant RxJS is higher priority than supporting backpressure at this time, even if separated into a different yet-unnamed type |
JS being primarily used to build user interfaces (and 90+% on the average browser if you consider the whole community), backpressure should not be a priority. If a push/pull sequence with backpressure is needed on Node.js, that isn't a use case for Rx. It's pretty "cheap" to create yet another JS library (much cheaper/quicker than to create a Java library, for instance), I don't see why force Rx to solve all use cases. |
+1 |
Since @benjchristensen was the primary advocate for this feature, I'm closing this issue for now. If at a later time we decide to add an additional type to support this, we can revisit. |
@benjchristensen Just for documentation completeness, Node.js's stream API does support objects as well as byte and string streams. Just set objectMode: true when creating a stream. https://nodejs.org/api/stream.html#stream_object_mode I do agree that the Node.js stream API is not nearly as flexible as Rx. |
It's a real shame RxJS 5.0 doesn't yet support backpressure. It precludes its use on the backend, or on the front-end when you're producing data. Simple example I wanted to use RxJS for and can't because of this: interactive CSV parse from DOM Reactive-programming without back-pressure is |
@timruffles You can handle back pressure in RxJS by using a BehaviorSubject and building an Observable chain that subscribes to itself. Something like this might do: // this behavior subject is basically your "give me the next batch" mechanism.
// in this example, we're going to make 5 async requests back to back before requesting more.
const BATCH_SIZE = 5;
const requests = new BehaviorSubject(BATCH_SIZE); // start by requesting five items
// for every request, pump out a stream of events that represent how many you have left to fulfill
requests.flatMap((count) => Observable.range(0, count).map(n => count - n - 1))
// then concat map that into an observable of what you want to control with backpressure
// you might have some parameterization here you need to handle, this example is simplified
// handle side effects with a `do` block
.concatMap(() => getSomeObservableOfDataHere().do(stuffWithIt), (remaining) => remaining)
// narrow it down to when there are no more left to request,
// and pump another batch request into the BehaviorSubject
.filter(remaining => remaining === 0)
.mapTo(BATCH_SIZE)
.subscribe(requests); |
@timruffles keep in mind, you might want to code in delays or pauses for whatever other processes you might need to run. When you do that, you need to do it within that concatMap, most likely. In some cases you might want to do it just before the So you can do back pressure, it's just more explicit. Which has pros and cons, of course. I hope this helps. |
@timruffles I'd enjoy eventually adding RxJava-style compositional back-pressure support to RxJS, but I doubt I could get @Blesh to merge the PR ;-). So instead here's another example of something sort of backpressure-y that sounds similar to your use case. Recomputing the the thresholds and limits on each new request would probably get it closer to something you could use. |
Is there any mechanism for adding metadata to or classifying the hidden queue that forms behind a slow consumer? If there was a way to tag a consumer's queue, then a producer could specify rules which state "when a consumer downstream (or elsewhere in the same production) as me that is tagged with 'SLOW' has over n items in his queue, then I will be paused". I'm thinking of using RxJS to control the virtual machine for a game-like system of mine. I can't predict the maximum number of graphics objects that a rendering frame will need to draw, because the user is in charge of that in the little programs they write to run on the virtual machine. My renderer is double buffered, and I'd like the virtual machine to become paused while there are more than 1 undrawn frames in the rendering queue waiting to be drawn. I sortof kindof do this pausing manually today when this condition arises, but I'm not happy with it. Perhaps there are more performant ways to do the scheduling I want to do than to use RxJS, but I'm interested in at least orchestrating the big state transitions in the system with RxJS. Having that kind of organization would definitely help me get a better handle on the asynchrony and frequent yielding that I need to do that is all over my system. Perhaps simply being able to tap into the statistics of a queue leading up to a consumer would be enough for me to implement something like this flow control myself. Please pardon my ignorance, I have some experience tinkering with RxJS and perhaps what I am describing is antithetical to the design or how the internals actually work. |
I spotted this project under the ReactiveX organisation: Is IxJS the proposed solution for "pull sequences"? I.e instead of an observable, use an iterable/async iterable? |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This issue on StackOverflow illustrates a problem with backpressure in current RxJS, I think
While I think this guy could solve his problem with an Iterator of Promise, I think it would be better if RxJS could support this with fewer object allocations.
cc/ @benjchristensen
The text was updated successfully, but these errors were encountered: