-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speech cut off mid sentence and punctuation error #57
Comments
Regarding sentences being cut off, I am able to reproduce this. For instance with this phrase: '“I always do my best to treat people, including those I disagree with, respectfully and will continue to do so.”' When I extract ONLY the paragraph that sentence appears in and run that only through epub2tts, it reads the whole sentence. However when I do the entire chapter of that book it only reads “I always do my best to treat people, including those I disagree with". This, and the punctuation issues, are coming from Coqui-TTS. Dropping part of the longer sentences has something to do with the overall size of the set of text being read, but I don't know what could be done about it. It would be interesting to try dramatically shortening what is sent for TTS (i.e. instead of entire chapters, send only paragraphs). This would result in WAY more WAV files being created and then concatenated so would likely have a performance penalty. I haven't tried it yet because I'm not sure (programatically) how best to break only on paragraphs while still maintaining grouping so that ultimately you end up with chapter/section breaks in the m4b file (and those breaks are based on each individual WAV file). I'll leave this open because it is a legitimate issue even if it's caused by a dependency (Coqui-TTS). Maybe someone will else will take a shot at fixing this :) |
Ugh now I wonder if this problem is larger than I realized :(. I'm reading along and caught another sentence in the same chapter the is cut off. Similarly it's one with multiple commas, and it drops everything after the last comma. This is going to inspire me to try smaller chunks for reading after all. Thank you @Aamir3d for bringing this up! |
You're welcome @aedocw ! Hope this gets sorted. Yours is an excellent project for audiobook conversion. I saw another interesting project that I would like to bring to your attention https://github.com/bnsantoso/sub-to-audio . Another request would be to create a GUI for ease of use. |
sub-to-audio looks really interesting, I'll take a closer look. It's nice to see something properly written vs. this hack job haha. GUI for ease of use would be nice, and is something I have had in mind for a while. What I plan to do first is make this run as a daemon with an API, then put a web interface in front of that. That could easily be run with docker/docker-compose so you could get a relatively easy path to GUI. I am on linkedin at https://linkedin.com/in/christopheraedo |
Connected with you. Thanks for your effort and interaction on this one! |
That looks cool, thanks for sharing. I have played a little bit with Bark but it's not usable without a decent GPU, and I have not played with it enough to see if it's remotely usable for long-form stuff. I am eagerly awaiting a pre-trained model here https://github.com/yl4579/StyleTTS2 - it's got a lot going for it and seems to do well with long segments. Probably still going to absolutely require a GPU but if it works well I'll definitely build that in as an option once it stabilizes some. |
Thanks, I'll take a look at this one. You're correct, a GPU (and a lot of VRAM) definitely helps! Although, I've found you don't need a very beefy GPU. An Nvidia 3060 with 12GB is enough to run local LLMs (7B), Stable Diffusion XL and most TTS. |
Try the branch "chunky", which creates an individual wave file for each sentence, but still rolls them up into chapter wave files so the current chapter splits work. VERY distressing, it did not fix the problem :(
It also skipped this part, probably because of the smart quotes: I have noticed this in books I've listened to, but never dug into it. I'm going to have to spend some time seeing what I can do about removing special characters like those smart-quotes (and maybe all quotes in general because that should not impact the reading). As far as I can tell, Coqui-TTS does notice commas and it introduces a slight pause, so I do not want to remove them (but that's the next thing I'll do to see if that makes any difference, just for fun). |
Yeah, epubs do have a lot of quotes and extended punctuation, and this would be problematic. OT- rsxdalv/tts-generation-webui#191 (comment) Maybe the issue is with Coqui overall in these complex cases. |
I think I found the issue, at least with the book I was testing with. The problem was quotes and smart quotes. I'm stripping them now before sending to Coqui-TTS and it's working great for me. Please test with your known-bad book and see if it works properly now, let me know, thanks! (BTW I have not looked at Piper TTS but I will add it to my list of things to check out :) ) |
Thanks! I'm going to try this out later today and see how it works with another couple ebooks! I'll share an update. |
@aedocw This worked really well! Thank you for your assistance with this. |
EXCELLENT! Thank you so much for finding this issue and noting it here, I really appreciate it, and finding and fixing this is a big improvement! |
Not sure if I raised this issue here before.
Issue 1: When using the epub2tts script with the default settings, I noticed many sentences were cut off in the middle, and the speech got muddled and skipped to the next sentence. This happens across models.
For example this text from the epub
_“I’m sorry we’re late,” Captain Malloy said to Yousif’s father.
“You’re welcome any time,” the doctor answered, shaking his hand.
“The District Commissioner planned to be here,” Malloy explained. “But at the last minute something came up and he couldn’t make it. He asked me to convey to you his regrets and his congratulations. Mabrook.”
“Thank you,” the doctor said.
“The house is truly magnificent.”
“You’re very kind.”_
The audio :
OTOH_error.-.Copy.mp4
Issue 2
There are punctuation errors where the letters after an apostrophe ' are also vocalized. Eg; They're is spoken as They Re
The text was updated successfully, but these errors were encountered: