-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: voice replacment agent #154
base: main
Are you sure you want to change the base?
feat: voice replacment agent #154
Conversation
WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant CH as ChatHandler
participant VRA as VoiceReplacementAgent
participant EL as ElevenLabsTool
participant VDB as VideoDBTool
U->>CH: Request voice replacement
CH->>VRA: Invoke run() method with required parameters
VRA->>VRA: Download video (_download_video_file)
VRA->>VRA: Download audio (_download_audio_file)
VRA->>VRA: Extract audio (_extract_audio_from_video)
VRA->>VRA: Retrieve transcript (_get_transcript)
alt Existing voice provided?
VRA->>EL: Retrieve voice (get_voice)
else
VRA->>EL: Clone voice (clone_audio)
end
VRA->>EL: Synthesize text to audio (synthesis_text)
VRA->>VDB: Upload audio & get URL (generate_url)
VRA->>CH: Return AgentResponse (success/failure)
CH->>U: Relay final response
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
backend/director/tools/elevenlabs.py (3)
233-239
: Add error handling around voice cloning.Currently, the method relies on the caller to handle exceptions. Consider wrapping the
self.client.clone()
call in a try/except block, logging errors, and returning a more descriptive message if cloning fails.
241-243
: Handle non-existent or restricted voices.If the voice lookup fails or if the user does not have access to a voice, this method may raise exceptions. Adding a try/except here would improve reliability and provide clearer error messages.
245-252
: Use a logger for errors instead of print statements.While the exception handling is good, consider using
logger.error(...)
for consistency instead of a raw-except Exception as e: - print(f"Error while text synthesis {e}") - return None +except Exception as e: + logger.error(f"Error while text synthesis {e}") + return Nonebackend/director/agents/voice_replacement.py (5)
78-101
: Use consistent error logging.
_download_video_file
uses a rawprint()
statement for errors (line 99) whereas_download_audio_file
useslogger.error()
. For uniformity, switch tologger.error()
here as well to ensure consistent logging.- print(f"Failed to download {video_url}: {e}") + logger.error(f"Failed to download {video_url}: {e}")
102-124
: Check content type before saving.Unlike
_download_video_file
,_download_audio_file
doesn’t verify that the resource is actually audio (e.g.,Content-Type: audio/*
). Adding a similar check to avoid silently saving invalid files would harden the logic.
125-135
: Clarify audio extraction flow.The method uploads the entire video as
media_type="audio"
and relies on the backend to extract audio. Consider documenting this behavior more explicitly or verifying the backend’s success before proceeding to download.
188-233
: Robust approach to sample and clone.Using the partial video stream from
start_time
toend_time
is a flexible solution for capturing audio. The fallback to an existingcloned_voice_id
or creating a new voice is well-handled. Just be mindful that extremely short or long audio might affect clone quality.
234-310
: Mixed error-handling behavior within the loop.If text synthesis fails (line 243), the method returns immediately, but if uploading fails (line 258), it continues to process other videos. Consider aligning behavior—either skip each failure or return immediately. Consistent logic helps users predict outcomes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/director/agents/voice_replacement.py
(1 hunks)backend/director/handler.py
(2 hunks)backend/director/tools/elevenlabs.py
(2 hunks)backend/director/tools/videodb_tool.py
(1 hunks)
🔇 Additional comments (8)
backend/director/handler.py (2)
28-28
: Add basic unit tests for the newly introduced agent.The import for
VoiceReplacementAgent
is straightforward; however, ensure that the new agent is covered by unit tests (or integration tests) to confirm correct behavior and avoid regressions, especially given its external dependencies.
73-73
: Confirmed integration of the new agent.No issues found with appending the agent to
self.agents
. This change properly registers theVoiceReplacementAgent
in the handler.backend/director/tools/videodb_tool.py (1)
154-154
: LGTM: Providing audio URLs is beneficial.Adding the generated URL in the dictionary return value is a helpful enhancement. Make sure any upstream consumers handle potential failures in
audio.generate_url()
, but otherwise this looks good.backend/director/tools/elevenlabs.py (2)
6-6
: Import usage looks valid.Importing
RequestOptions
is consistent with the usage insynthesis_text()
. No issues here.
232-232
: No functional changes.This blank line appears to be purely stylistic. No further action required.
backend/director/agents/voice_replacement.py (3)
136-148
: Efficient fallback for no transcript.The retry logic is clear—if a transcript is unavailable, you index spoken words, then retrieve it. This is a neat pattern. Just ensure that indexing won't cause indefinite retries if something fails repeatedly (here it appears to raise on error, which is sufficient).
149-159
: Overlay logic is straightforward.Adding the video inline first and audio as an overlay next is clear. Ensure large audio clips or misaligned durations are handled gracefully. Otherwise, this function looks good.
160-186
: Authorization check is valid.Returning an error immediately if
is_authorized_to_clone_voice
is false ensures no unintentionally cloned voices. Good approach.
video = self.videodb_tool.get_video(video_id=video_id) | ||
text_to_synthesis = self._get_transcript(video_id=video_id) | ||
|
||
self.output_message.actions.append(f"Synthesising {video["name"]}'s transcript in cloned voice") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix f-string quoting errors.
These strings will cause a syntax error because they nest double quotes. Replace video["name"]
with video['name']
inside the f-strings:
- f"Synthesising {video["name"]}'s transcript in cloned voice"
+ f"Synthesising {video['name']}'s transcript in cloned voice"
- status_message=f"Adding cloned voice to {video["name"]} failed"
+ status_message=f"Adding cloned voice to {video['name']} failed"
- status_message=f"Adding cloned voice to {video["name"]}"
+ status_message=f"Adding cloned voice to {video['name']}"
- f"Here is your video {video["name"]} with the cloned voice"
+ f"Here is your video {video['name']} with the cloned voice"
Also applies to: 261-261, 270-270, 284-284
Added a voice replacement agent. This agent takes voice sample from a sample video and overlays given videos with the cloned voice.
Inputs:
Output:
Summary by CodeRabbit