This demo provides three sample use cases for applying cloud-based AI functions from Sieve on Daily video call recordings.
This demo was tested with Python version 3.11.6. We recommend running this in a virtual environment.
Ensure that you have FFmpeg installed on your machine.
- Clone this repository.
- Copy the
.env.sample
file into.env
. DO NOT submit your.env
file to version control. - Update your
.env
with yourDAILY_API_KEY
In the root of the repository on your local machine, run the following commands:
python3 -m venv venv
source venv/bin/activate
You can sign up for an account at https://www.sievedata.com/. If you'd like to make use of the TTS Lipsyncing function, you'll want to sign up for an account with Elevenlabs in order to use their TTS models.
Add secrets to Sieve
Some Sieve functions, such as the text_to_video_lipsync
function, rely on external API keys to work. Rather than setting these as environment variables locally, you'll want to visit https://www.sievedata.com/dashboard/settings/secrets and set them there.
In the virtual environment, run the following:
- Run
pip install -r requirements.txt
from the root directory of this repo on your local machine. - Starting the server: Run
quart --app server/index.py --debug run
in your terminal. - Starting the client: Inside the
vite
directory, runnpm install
followed bynpm run dev
Now, open the localhost address shown in your terminal after the last step above, which should be localhost:5173
. You should see the front-end of the demo allowing you to fetch your latest Daily recordings.
This demo starts by enabling users to fetch their latest Daily video recordings, using Daily's REST API. Once obtained, users are presented with three choices for running Sieve functions to run on their Daily recordings. This includes the following:
- audio_enhancement
- text_to_video_lipsync
- Video Dubbing, which comprises of four different Sieve functions:
All Sieve functions follow more or less the same usage pattern, which is as follows:
- Upload your video or audio to Sieve
- Fetch the Sieve function of your choice
- Run the Sieve function
For example:
import sieve
# Step 1: Upload your video/audio to Sieve
audio = sieve.Audio(url="https://storage.googleapis.com/sieve-prod-us-central1-public-file-upload-bucket/79543930-5a71-45d9-b690-77f4f0b2bfaa/1a704dda-d8be-4ae1-9894-b4ee63c69567-input-audio.mp3")
# Step 2: Fetch the Sieve function of your choice:
audio_enhancement = sieve.function.get("sieve/audio_enhancement")
# Step 3: Run the Sieve function (and capture the output)
filter_type = "all"
enhance_speed_boost = False
enhancement_steps = 50
output = audio_enhancement.run(audio, filter_type, enhance_speed_boost, enhancement_steps)
Closer inspection of this demo code will reveal that not all Daily recordings are passed in to Sieve functions as is.
For example, for both of the lip-syncing demos, we had to shave off the first second of the Daily recording since they begin with a brief black screen instead of the speaker's face. Additionally, the lip-syncing functions tend not to work on videos where there is more than one active speaker. Input requirements for each Sieve function is clearly outlined in their README files available on Sieve's website, so be sure to reference those when trouble-shooting.
It's also worth noting that the demos in this app solely rely on standard Daily recording formats, but more possibilities open up when using Sieve functions on Daily raw track recordings, so if you're looking for something more custom, that's something to look into!
This demo contains no authentication features. Processed videos are placed into a public folder that anyone can reach, associated with a UUID. Should a malicious actor guess or brute-force a valid project UUID, they can download processed output associated with that ID. For a production use case, access to output files should be gated.
This demo implements basic error handling in the form of writing errors to stderr
. For a production use-case, appropriate logging and metrics should be added.