subtitles with timings #90

DiLfestr · 2025-01-16T18:49:39Z

Hi
Thanks for your development
Can you tell me how to dub srt subtitles taken from outside with timings? It just speaks the text, but without pauses and timings.

rotemdan · 2025-01-16T19:07:57Z

I'm not 100% sure what you meant but I guess you want to synthesize subtitles such that they match the start and end times of individual cues?

Just matching the timing of the cues isn't necessarily what most users want when dubbing, there could also be lip syncing involved, which is far beyond what a library like this is designed to do, or matching an existing speaker (which is not easy to do with a natural sounding result). I guess you mean you want some sort of voiceover.

It's technically possible to some extent, but the challenge is to match the speech rate such that it fits the cue (individual subtitle), without pushing the next cue forward. Just speeding up the speech when the normal speech rate wouldn't fit the cue, may not sound natural.

In general it's possible to synthesize to approximately match the cue times, but it's difficult to make it work well in every case. For example, on some instances, either the cue timing may assume the speech is faster than the actual synthesized one, or the cue isn't timed correctly, but it's difficult to know for sure which is which. Also, if a cue is too late, then it may force the algorithm to synthesize with very fast speed.

This kind of feature will take some time to get done, but there some cases where it would be difficult for the algorithm to know what's the best thing to do, and the result may not going to sound natural or desirable.

It's a possible future feature. I'm not sure how much demand there is for it, though (at the moment at least). I'll tag this as a feature suggestion. Thanks for suggesting it.

DiLfestr · 2025-01-16T21:14:09Z

I have so many movie that only have subtitles. And if there was any kind of dubbing, it would be a great joy.
And in the future, when we transcribe the original. We memorize the beginning of each phrase, and we can memorize the voice parameters. (I read that neural networks are able to recognize emotions in the voice, such as happiness, sadness or anger, as well as determine the gender of the speaker). And then when we will do voice-over in another language we can take into account these parameters.

rotemdan added synthesis Issue related to speech synthesis feature Issue proposes a new feature labels Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subtitles with timings #90

subtitles with timings #90

DiLfestr commented Jan 16, 2025

rotemdan commented Jan 16, 2025 •

edited

Loading

DiLfestr commented Jan 16, 2025 •

edited

Loading

subtitles with timings #90

subtitles with timings #90

Comments

DiLfestr commented Jan 16, 2025

rotemdan commented Jan 16, 2025 • edited Loading

DiLfestr commented Jan 16, 2025 • edited Loading

rotemdan commented Jan 16, 2025 •

edited

Loading

DiLfestr commented Jan 16, 2025 •

edited

Loading