Stream audio/image input values into and out of the server #2249

jonatanklosko · 2023-10-04T16:22:18Z

Currently we pass the values for audio/image as LV events to and from the client encoded with base64. For large files (especially audio) base64 encoding takes a notable amount of time. More importantly, passing such large payloads using the LV socket is bad because it can block other events.

This PR introduces off-bound streaming in both directions. For sending the binary value to the server we use LV uploads and for getting the value preview to all clients we fetch data from a separate endpoint. In case of audio the endpoint always returns a WAV file, so we can use it as the <audio> src directly; if the data we keep is PCM, we encode on the fly; the endpoint also supports Range requests so that seeking works as expected. In case of images we always return the whole binary; for png/jpg we just set the <img> src directly; for pixel data we fetch the binary and fill in a canvas.

In line with all these changes, we no longer keep the values in memory as binaries, instead we store them in files (essentially mirroring the file input). We are making this change all the way to Kino, so Kino.Input.read(...) is going to return a file reference rather than the binary itself.

Random highlights

In case of audio, we decode the file on the client and get some metadata like sampling_rate and channels. We need to send that metadata to the server alongside the actual binary. To do that, we build an annotated binary, which is <<metadata_size, metadata_json, binary>>. However, when doing the LV upload, we don't want all of this to end up in the uploaded file. Instead we want to consume the metadata and then store the actual binary in a file. Doing that was very straightforward with a custom upload writer, that is LivebookWeb.AnnotatedTmpFileWriter. // EDIT: changed in favour of phoenixframework/phoenix_live_view@d386704
In the controller endpoint, we need to know the input value that we want to return. We could ask the session process, but this (a) increases the load on that single process (b) does not work with per-user input (only LV knows the value of these). Both of these are elegantly solved by generating a Phoenix.Token with the LV pid and input id, which we decode in the controller and ask LV for the input value directly (which also ensures the LV is still running).

Demo

Perhaps the coolest thing of all is that for a large audio, we now show both a decoding indicator and a progress bar during the actual upload:

input_stream.mp4

github-actions · 2023-10-04T16:27:15Z

Uffizzi Preview deployment-37629 was deleted.

lib/livebook_web/helpers/codec.ex

josevalim · 2023-10-04T18:47:09Z

lib/livebook_web/helpers/codec.ex

@@ -0,0 +1,133 @@
+defmodule LivebookWeb.Helpers.Codec do


Beautifully done!

lib/livebook_web/controllers/session_controller.ex

lib/livebook_web/helpers/codec.ex

josevalim

Beautiful. I have added some additional comments and the previous note about tests, then we can ship it! :)

Co-authored-by: José Valim <[email protected]>

josevalim

ship it

Stream audio/image input values into and out of the server

9b9af36