Adding the ability to manually flush to segment generators. #13

chongzluong · 2024-11-24T22:45:40Z

Overview

We're adding the ability to manually flush through the Python Client.

A flush in this context will be called through flush() and result in a generator function that, when called, iterates over all audio generated for transcripts submitted since the last time (if ever) a flush() was called.

In other words, if you do the following:

ws = await client.tts.websocket()
ctx = ws.context()
await ctx.send(..., transcript=transcript_1, ...)
await ctx.send(..., transcript=transcript_2, ...)
receiver_1 = await ctx.flush()
await ctx.send(..., transcript=transcript_3, ...)
receiver_2 = await ctx.flush()

Then iterating over the receivers will get you the following audio:

async for output in receiver_1():
  # Audio Chunks for Transcript 1 and 2

async for output in receiver_2():
  # Audio Chunks for Transcript 3

Previously Cartesia TTS was solely multiple inputs to a single output receiver, at least with regards to continuations. This is because we perform smart splitting & merging on our end to optimize model performance.

However, many of the existing TTS architectures have multiple inputs to multiple output generators, so we should support this flexibility to accommodate for integrating with other providers with as minimal friction as possible.

If you call flush() it'll be because you've performed chunking yourself (ideally the transcript or transcript(s) submitted are a sentence).

Implementation

Currently this is implemented using a similar mechanism to how we wrapped multiplexing over a single websocket. We performed this by iterating over the receiver asynchronously in the background, and separating chunks into Async Queues mapped by their Context ID. In this case, we separate chunks into Lists of Async Queues, where their Flush ID represents the index of the List where the queue we're populating is. When flush() is called, we append a new Asynchronous Queue to the list for that Context ID.

In a previous iteration of this PR I was doing this non-deterministically using Async Polling from the N queue and the N+1 queue. Instead we introduced a notion of flush_done event fired after manual flushes prior to incrementing the flush_id. This allows us to deterministically indicate that the generator function has completed.

Testing

Tested on a local API version - Prod already added the manual flush capabilities but my local branch has the changes that incorporate the flush_done event. Confirmed that given 3 transcripts, I can create the 3 generator functions and iterate over each for the distinct audio generations.

Also added a unit test that creates 3 generators from 3 transcript + flushes and iterates over each of them. This currently passes in the staging deployment, this PR is blocked until the flush done emission is deployed in production.

cartesia/_async_websocket.py

chongzluong · 2024-11-25T20:40:16Z

Tests will pass when production deployment is updated (I ran the tests on staging)

cartesia/version.py

cartesia/_async_websocket.py

cartesia/version.py

cartesia/_async_websocket.py

Bumping client to `1.3.0`. Primary change: - [Adding manual flushing for multiple generators](#13)

chongzluong added 4 commits November 23, 2024 15:11

Adding flush and flushID to async context queues

491e3b1

Bumping version to 1.3.0

df1eb08

Adding async check to determine if a generator should break

bc84e71

Lint

9411400

nimz reviewed Nov 24, 2024

View reviewed changes

cartesia/_async_websocket.py Outdated Show resolved Hide resolved

nimz reviewed Nov 24, 2024

View reviewed changes

cartesia/_async_websocket.py Outdated Show resolved Hide resolved

nimz reviewed Nov 24, 2024

View reviewed changes

cartesia/_async_websocket.py Outdated Show resolved Hide resolved

chongzluong added 3 commits November 25, 2024 11:32

Nimit's nits

4c7ce75

Ruff linting

c7c24a5

Switching from async polling to deterministic events

2510860

chongzluong marked this pull request as ready for review November 25, 2024 12:56

chongzluong requested review from nimz, ad12, kbrgl, kunal-cai and noahlt November 25, 2024 12:56

chongzluong added 2 commits November 26, 2024 05:28

Updating with flush unit test + default output format

4c858e8

Whoops

e79e9ed

kbrgl approved these changes Nov 25, 2024

View reviewed changes

cartesia/version.py Outdated Show resolved Hide resolved

cartesia/_async_websocket.py Show resolved Hide resolved

cartesia/_async_websocket.py Outdated Show resolved Hide resolved

chongzluong added 2 commits November 26, 2024 08:38

Adding validation for flush ID in get_message call

8bf17ab

Used bumpversion to bump now

a94505c

ad12 reviewed Nov 26, 2024

View reviewed changes

chongzluong added 2 commits November 27, 2024 02:38

Minor tweaks per Arjun's request

59f5693

Fixing tests that didn't use default output format

26437f6

chongzluong merged commit e0f5d14 into main Nov 27, 2024
6 checks passed

chongzluong deleted the chongz--add_manual_flushing branch November 27, 2024 01:54

chongzluong mentioned this pull request Nov 27, 2024

[bumpversion] 1.3.0 #14

Merged

chongzluong added a commit that referenced this pull request Nov 27, 2024

[bumpversion] 1.3.0 (#14)

1ae25e9

Bumping client to `1.3.0`. Primary change: - [Adding manual flushing for multiple generators](#13)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the ability to manually flush to segment generators. #13

Adding the ability to manually flush to segment generators. #13

chongzluong commented Nov 24, 2024 •

edited

Loading

chongzluong commented Nov 25, 2024

Adding the ability to manually flush to segment generators. #13

Adding the ability to manually flush to segment generators. #13

Conversation

chongzluong commented Nov 24, 2024 • edited Loading

Overview

Implementation

Testing

chongzluong commented Nov 25, 2024

chongzluong commented Nov 24, 2024 •

edited

Loading