-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the ability to manually flush to segment generators. #13
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nimz
reviewed
Nov 24, 2024
nimz
reviewed
Nov 24, 2024
nimz
reviewed
Nov 24, 2024
Tests will pass when production deployment is updated (I ran the tests on staging) |
kbrgl
approved these changes
Nov 25, 2024
ad12
reviewed
Nov 26, 2024
Merged
chongzluong
added a commit
that referenced
this pull request
Nov 27, 2024
Bumping client to `1.3.0`. Primary change: - [Adding manual flushing for multiple generators](#13)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
We're adding the ability to manually flush through the Python Client.
A flush in this context will be called through
flush()
and result in a generator function that, when called, iterates over all audio generated for transcripts submitted since the last time (if ever) aflush()
was called.In other words, if you do the following:
Then iterating over the receivers will get you the following audio:
Previously Cartesia TTS was solely multiple inputs to a single output receiver, at least with regards to continuations. This is because we perform smart splitting & merging on our end to optimize model performance.
However, many of the existing TTS architectures have multiple inputs to multiple output generators, so we should support this flexibility to accommodate for integrating with other providers with as minimal friction as possible.
If you call
flush()
it'll be because you've performed chunking yourself (ideally the transcript or transcript(s) submitted are a sentence).Implementation
Currently this is implemented using a similar mechanism to how we wrapped multiplexing over a single websocket. We performed this by iterating over the receiver asynchronously in the background, and separating chunks into Async Queues mapped by their Context ID. In this case, we separate chunks into Lists of Async Queues, where their Flush ID represents the index of the List where the queue we're populating is. When
flush()
is called, we append a new Asynchronous Queue to the list for that Context ID.In a previous iteration of this PR I was doing this non-deterministically using Async Polling from the N queue and the N+1 queue. Instead we introduced a notion of
flush_done
event fired after manual flushes prior to incrementing theflush_id
. This allows us to deterministically indicate that the generator function has completed.Testing
Tested on a local API version - Prod already added the manual flush capabilities but my local branch has the changes that incorporate the
flush_done
event. Confirmed that given 3 transcripts, I can create the 3 generator functions and iterate over each for the distinct audio generations.Also added a unit test that creates 3 generators from 3 transcript + flushes and iterates over each of them. This currently passes in the staging deployment, this PR is blocked until the flush done emission is deployed in production.