[Help]: How to use MaskGCT to generate translated audio from an english video? #335

KylinMountain · 2024-11-06T23:59:47Z

Problem Overview

I have a video speaking english, and I want it to say Chinese in the same speed, keep synchronize between video and audio.

How to that? Is there any instruction? Thank you.

Steps Taken

(Detail your attempts to resolve the issue, including any relevant steps or processes.)

Config/File changes: ...
Run command: ...
See errors: ...

Expected Outcome

(A clear and concise description of what you expected to happen.)

Screenshots

(If applicable, add screenshots to help explain your problem.)

Environment Information

Operating System: [e.g. Ubuntu 20.04.5 LTS]
Python Version: [e.g. Python 3.9.15]
Driver & CUDA Version: [e.g. Driver 470.103.01 & CUDA 11.4]
Error Messages and Logs: [If applicable, provide any error messages or relevant log outputs]

Additional context

(Add any other context about the problem here.)

synthere · 2024-11-08T03:39:10Z

The general steps might be taken:

get text from the audio in the english video, using such tool like whisper or funasr;
translate the text into Chinese text;
generate Chinese audio from the translated chinese text in 2), using tts tool like MaskGCT
resync the audio with the original video.

KylinMountain · 2024-11-08T03:49:03Z

@synthere if try like this, we can't copy the accent of the orginal audio and control the tts speed as the original one.

synthere · 2024-11-08T05:19:35Z

The accent could be cloned using the voice cloning function, and the tts speed can be adjusted also. Actually, I just created a video dubbing tool the other day, which u may have a try here syntheredub

synthere · 2024-11-08T09:34:58Z

I also tried the maskgct, which can control the target duration. But the resulted audio is not exactly aligned with the original as shown below(Top is the original audio, the bottom generated).

So precise alignment and resynchronization are sometimes necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help]: How to use MaskGCT to generate translated audio from an english video? #335

[Help]: How to use MaskGCT to generate translated audio from an english video? #335

KylinMountain commented Nov 6, 2024

synthere commented Nov 8, 2024

KylinMountain commented Nov 8, 2024

synthere commented Nov 8, 2024

synthere commented Nov 8, 2024

[Help]: How to use MaskGCT to generate translated audio from an english video? #335

[Help]: How to use MaskGCT to generate translated audio from an english video? #335

Comments

KylinMountain commented Nov 6, 2024

Problem Overview

Steps Taken

Expected Outcome

Screenshots

Environment Information

Additional context

synthere commented Nov 8, 2024

KylinMountain commented Nov 8, 2024

synthere commented Nov 8, 2024

synthere commented Nov 8, 2024