Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the ability to manually flush to segment generators. #13

Merged
merged 13 commits into from
Nov 27, 2024

Conversation

chongzluong
Copy link
Contributor

@chongzluong chongzluong commented Nov 24, 2024

Overview

We're adding the ability to manually flush through the Python Client.

A flush in this context will be called through flush() and result in a generator function that, when called, iterates over all audio generated for transcripts submitted since the last time (if ever) a flush() was called.

In other words, if you do the following:

ws = await client.tts.websocket()
ctx = ws.context()
await ctx.send(..., transcript=transcript_1, ...)
await ctx.send(..., transcript=transcript_2, ...)
receiver_1 = await ctx.flush()
await ctx.send(..., transcript=transcript_3, ...)
receiver_2 = await ctx.flush()

Then iterating over the receivers will get you the following audio:

async for output in receiver_1():
  # Audio Chunks for Transcript 1 and 2

async for output in receiver_2():
  # Audio Chunks for Transcript 3

Previously Cartesia TTS was solely multiple inputs to a single output receiver, at least with regards to continuations. This is because we perform smart splitting & merging on our end to optimize model performance.

However, many of the existing TTS architectures have multiple inputs to multiple output generators, so we should support this flexibility to accommodate for integrating with other providers with as minimal friction as possible.

If you call flush() it'll be because you've performed chunking yourself (ideally the transcript or transcript(s) submitted are a sentence).

Implementation

Currently this is implemented using a similar mechanism to how we wrapped multiplexing over a single websocket. We performed this by iterating over the receiver asynchronously in the background, and separating chunks into Async Queues mapped by their Context ID. In this case, we separate chunks into Lists of Async Queues, where their Flush ID represents the index of the List where the queue we're populating is. When flush() is called, we append a new Asynchronous Queue to the list for that Context ID.

multiplexing_flushing_python

In a previous iteration of this PR I was doing this non-deterministically using Async Polling from the N queue and the N+1 queue. Instead we introduced a notion of flush_done event fired after manual flushes prior to incrementing the flush_id. This allows us to deterministically indicate that the generator function has completed.

Testing

Tested on a local API version - Prod already added the manual flush capabilities but my local branch has the changes that incorporate the flush_done event. Confirmed that given 3 transcripts, I can create the 3 generator functions and iterate over each for the distinct audio generations.

Also added a unit test that creates 3 generators from 3 transcript + flushes and iterates over each of them. This currently passes in the staging deployment, this PR is blocked until the flush done emission is deployed in production.

@chongzluong chongzluong marked this pull request as ready for review November 25, 2024 12:56
@chongzluong
Copy link
Contributor Author

Tests will pass when production deployment is updated (I ran the tests on staging)

@chongzluong chongzluong merged commit e0f5d14 into main Nov 27, 2024
6 checks passed
@chongzluong chongzluong deleted the chongz--add_manual_flushing branch November 27, 2024 01:54
@chongzluong chongzluong mentioned this pull request Nov 27, 2024
chongzluong added a commit that referenced this pull request Nov 27, 2024
Bumping client to `1.3.0`.

Primary change:
- [Adding manual flushing for multiple
generators](#13)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants