Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subtitles with timings #90

Open
DiLfestr opened this issue Jan 16, 2025 · 2 comments
Open

subtitles with timings #90

DiLfestr opened this issue Jan 16, 2025 · 2 comments
Labels
feature Issue proposes a new feature synthesis Issue related to speech synthesis

Comments

@DiLfestr
Copy link

Hi
Thanks for your development
Can you tell me how to dub srt subtitles taken from outside with timings? It just speaks the text, but without pauses and timings.

@rotemdan rotemdan added synthesis Issue related to speech synthesis feature Issue proposes a new feature labels Jan 16, 2025
@rotemdan
Copy link
Member

rotemdan commented Jan 16, 2025

I'm not 100% sure what you meant but I guess you want to synthesize subtitles such that they match the start and end times of individual cues?

Just matching the timing of the cues isn't necessarily what most users want when dubbing, there could also be lip syncing involved, which is far beyond what a library like this is designed to do, or matching an existing speaker (which is not easy to do with a natural sounding result). I guess you mean you want some sort of voiceover.

It's technically possible to some extent, but the challenge is to match the speech rate such that it fits the cue (individual subtitle), without pushing the next cue forward. Just speeding up the speech when the normal speech rate wouldn't fit the cue, may not sound natural.

In general it's possible to synthesize to approximately match the cue times, but it's difficult to make it work well in every case. For example, on some instances, either the cue timing may assume the speech is faster than the actual synthesized one, or the cue isn't timed correctly, but it's difficult to know for sure which is which. Also, if a cue is too late, then it may force the algorithm to synthesize with very fast speed.

This kind of feature will take some time to get done, but there some cases where it would be difficult for the algorithm to know what's the best thing to do, and the result may not going to sound natural or desirable.

It's a possible future feature. I'm not sure how much demand there is for it, though (at the moment at least). I'll tag this as a feature suggestion. Thanks for suggesting it.

@DiLfestr
Copy link
Author

DiLfestr commented Jan 16, 2025

I have so many movie that only have subtitles. And if there was any kind of dubbing, it would be a great joy.
And in the future, when we transcribe the original. We memorize the beginning of each phrase, and we can memorize the voice parameters. (I read that neural networks are able to recognize emotions in the voice, such as happiness, sadness or anger, as well as determine the gender of the speaker). And then when we will do voice-over in another language we can take into account these parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issue proposes a new feature synthesis Issue related to speech synthesis
Projects
None yet
Development

No branches or pull requests

2 participants