[Proposal] remove most rxjs-related code from the SegmentBuffers and transport code #916

peaBerberian · 2021-03-09T17:05:04Z

This PR is a first attempt to remove most RxJS-related code from the RxPlayer code. Here it is mainly done for the code inside src/core/segment_buffers.

There are multiple reasons for this weird new idea:

Observables in general are complex beasts. By default, they are lazy, they are called as much times as they are subscribed to, they are "teared down" (all linked side-effects such as event listeners are cleaned-up) automatically on unsubscription,
depending on the scheduler used they may emit or be subscribed to in a synchronous or multiple types of asynchronous ways, they keep being open until you either complete them, emit an error or unsubscribe from them - when you're not careful, you thus have a chance of leaking memory.

Even if we understand it well, we still make frequent mistakes with it, some having pretty big consequences (e.g. big memory leak in fix memory leak in RepresentationStream #913, API events sent too much time in eme/api: stop sending eme-related events twice #850 - well to be honest - I am guilty on both I think!).

And that's only talking about us, with multiple years of experience with it. Newer developper may find it even more complex to understand.
Observables are IMO too much used in the code, even when Promises seems like a simpler and better solution (for example when there is only one item emitted).
They are too risky (in my opinion) when we're talking about code with important side-effects (e.g. HTTP requests, SourceBuffer interactions, API interactions) due to the easyness with which we can perform them multiple times by mistake.
I find Observables tear-down too implicit (the fact that it happens silently on unsubscription, mostly) for what we use it for:
Unsubscribing from some Observables will stop the content, others will stop segments from being pushed, others will cancel requests.

Here I would prefer a much more explicit solution: one where we explicitly write a line of code anouncing that operations are aborted at the time we abort them.
Call stacks are unreadable and too long currently. I don't know if we ever exploited any call stack information because of that.

The fact that call stacks are unreadable seems to be directly linked to the heavy usage of RxJS operators.
Observables cannot be used with async/await. We often end-up with a huge callback hell when inner Observables contain other inner Observables and so one, whereas it could in effect be much simpler.

I still think that there are real good use cases for RxJS in the RxPlayer, for example I found the text synchronization code in the
HTMLTextSegmentBuffer really simple to grasp and functional.

In this PR, I also created the TaskCanceller util, which allows to implement explicit Promise cancellation roughly the same way that is is done for the native fetch API (on that note, I was heavily inspired by the AbortController interface - used to cancel fetch operations - to write it).

Thoughts?

peaBerberian · 2021-03-09T17:06:01Z

src/core/segment_buffers/garbage_collector.ts

@@ -26,7 +26,9 @@ import {
  mergeMap,
 } from "rxjs/operators";
 import log from "../../log";
+import observablifyCancellablePromise from "../../utils/observablify_promise";


Very ugly name, I know. Though I wanted to communicate the fact that a TaskCanceller was used here (but perhaps "observablify" is the uglier part of the name :p)

Every segment parser in `src/transports` (which allows to parse things like data, encryption metadata and time information from loaded segments according to the current streaming protocol) do so synchronously. However, those are typed as Observables. My guess (this was before my time on the project) is that it was mainly for two reasons: 1. Manifest parsing can be asynchronous (for example to fetch MPD xlinks or the clock) and it follows the same pipeline architecture (a loader and a parser) than segments. Letting segment parser be asynchronous thus allows to harmonize all resource-fetching code to a similar interface. 2. Theorically, segment parsing could be asynchronous. As far as I can see for now, this could be because of two reasons: - We need to perform an HTTPS request to fully parse segment information. We already encountered such case when trying to implement the "sidx ref type 1" feature (#754). However we did not merge that work for now due to the complexity and most of all because no real DASH MPD seems to rely on this feature (nor ar we able to envisage a real use-case for it). - segment parsing is performed in a Worker. But We do not feel the need to do that for now as the CPU footprint of parsing segments is really low. So theorically, there's good reasons to make the parsing operation asynchronous, but reastically for now, there's none. Meanwhile, making that operation asynchronous lead to big headaches: - it is one of the main reasons why the PR on making the init and first media segment's requests in parallel [#918] was not merged. Subtle bugs, some that would only be seen in a case where the parsing is asynchronous (so not really existant for now) could arise because of that. - Even if the response of the parser is wrapped in an Observable, it wasn't lazy (the code would be called even if the parser was not subscribed to). We could fix that by wrapping all parsers in a `defer` call from RxJS, but it would complexify even more the code. - Handling Observables is a LOT harder than just using directly a response in general. There's pending work about not depending so much on RxJS in the future [#916] and I think beginning to remove it where it is not even needed is a good first step. If it become needed again in the future (e.g. to support the "sidx ref type 1" feature), we could always revert that work, maybe even with a more Promise-based solution which might be much easier to reason about. Thoughts?

This commit proposes a simplification of types used by the segment parsers by defining a unique segment data type instead of two: one for the init segment and the other for media segments. As `SegmentBuffers` - on which those segment data will be pushed - are only generic based on a single segment data type, this is not really necessary anyway. The real reason behind creating two types was to better indicate that text and image Representation had no initialization segment (by setting their type to `null`) but considering that a parsed media segment could theoretically be `null` as well, nothing is lost by using the same type for both. Being generic over a single type simplify the way we approach segment parsers in my opinion. Now we're thinking: "which type is the parsed segment data in?" not "which type is a parsed init segment data and which type is a media segment data?". Both should generally be in the same format and the only difference until now was that one could be always `null`. Also, considering that media segments could theorically contain initialization data themselves, this is one more argument in the direction of only defining a single type. --- Another improvement in that commit is to limit `any` typings to the small `segment_fetcher_creator.ts` file. Doing this there should reduce possible typing errors as segment loader and parsers are now properly type-checked (both their inputs and outputs).

This commit is a first attempt to remove most RxJS-related code from the RxPlayer code. Here it is mainly done for the code inside `src/core/segment_buffers`. There are multiple reasons for this weird new idea: 1. Observables in general are complex beasts. By default, they are lazy, they are called as much times as they are subscribed to, they are "teared down" (all linked side-effects such as event listeners are cleaned-up) automatically on unsubscription, depending on the scheduler used they may emit or be subscribed to in a synchronous or multiple types of asynchronous ways, they keep being open until you either complete them, emit an error or unsubscribe from them - when you're not careful, you thus have a chance of leaking memory. Even if we understand it well, we still make frequent mistakes with it, some having pretty big consequences (e.g. big memory leak in #913, API events sent too much time in #850 - well to be honest - I am guilty on both I think!). And that's only talking about us, with multiple years of experience with it. Newer developper may find it even more complex to understand. 2. Observables are IMO too much used in the code, even when `Promises` seems like a simpler and better solution (for example when there is only one item emitted). 3. They are too risky (in my opinion) when we're talking about code with important side-effects (e.g. HTTP requests, SourceBuffer interactions, API interactions) due to the easyness with which we can perform them multiple times by mistake. 4. I find Observables tear-down too implicit (the fact that it happens silently on unsubscription, mostly) for what we use it for: Unsubscribing from some Observables will stop the content, others will stop segments from being pushed, others will cancel requests. Here I would prefer a much more explicit solution: one where we explicitly write a line of code anouncing that operation are aborted at the time we abort them. 5. Call stacks are unreadable and too long currently. I don't know if we ever exploited any call stack information because of that. The fact that call stacks are unreadable seems to be directly linked to the heavy usage of RxJS operators. 6. Observables cannot be used with async/await. We often end-up with a huge callback hell when inner Observables contain other inner Observables and so one, whereas it could in effect be much simpler. I still think that there are real good use cases for RxJS in the RxPlayer, for example I found the text synchronization code in the `HTMLTextSegmentBuffer` really simple to grasp and functional. In this commit, I also created the `TaskCanceller` util, which allows to implement explicit Promise cancellation roughly the same way that is is done for the native `fetch` API (on that note, I was heavily inspired by the `AbortController` interface - used to cancel fetch operations - to write it).

After doing a proof-of-concept looking at how some parts of the code looks like without RxJS (#916), this is a first functional proposal which looks good enough to me to be merged. It removes all RxJS code from the `transports` code in `src/transports`. As a reminder, the reasons for doing this are: 1. Observables are complicated and full of implicit behaviors (lazily running, sync or async, unsubscribing automatically after the last unsubscription etc.) which is difficult to reason about, especially for a newcomer. Things like exploiting schedulers through the `deferSubscriptions` util to work-around some subtle potential race-conditions, or merging Observables in a specific order for similar reasons, are ugly hacks that are difficult to explain to someone not familiar with that code. Even for us with multiple years of experience with it, we sometimes struggle with it. 2. Promises, event listeners - and direct callbacks in general - are generally much more explicit and most developpers (at least JS/TS devs) are familiar with them. 3. Call stacks are close to inexploitable when using RxJS. 4. Promises provide async/await syntax which can improve drastically the readability of our async-heavy code, which for the moment suffer from callback hells almost everywhere. However, I'm still not sure if this wish (getting rid of RxJS) is shared by other maintainers and/or contributors, so it is still only a proposal. Thoughts?

peaBerberian · 2021-05-20T14:25:30Z

Closed for now in profit of #962