Synthesis: VITS voices have various issues related to model training #14

rotemdan · 2023-07-28T18:49:53Z

For example, when the default English voice (Amy / Low) gets an utterance that is a single word, like "two", it seems to mispronounce it as something that sounds closer to "ten". Other voices have much more serious issues. For example, the Greek voice may produce bizarre, nonsensical utterances when given English text (most likely it hasn't been trained for English, or Latin characters in general, and doesn't know what to do).

This is an issue with the training of the models, not related to the code itself.

These models are trained as part of the Piper speech system, mostly by Michael Hansen. You can check out the Piper issue tracker to give feedback on these sorts of problems.

Echogarden doesn't actually use the Piper system, but reimplements it in JavaScript, with several enhancements that are not present in the original C++ code. Only the ONNX models are shared.

The original ONNX models are published on the piper-voices Hugging Face repository. I repackage them as tar.gz archives and upload them to the echogarden-packages Hugging Face repository, from which they (and all other packages) are downloaded when needed.

The text was updated successfully, but these errors were encountered:

rotemdan added bug Something isn't working synthesis Issue related to speech synthesis external Issues that are related to external sources labels Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthesis: VITS voices have various issues related to model training #14

Synthesis: VITS voices have various issues related to model training #14

rotemdan commented Jul 28, 2023 •

edited

Loading

Synthesis: VITS voices have various issues related to model training #14

Synthesis: VITS voices have various issues related to model training #14

Comments

rotemdan commented Jul 28, 2023 • edited Loading

rotemdan commented Jul 28, 2023 •

edited

Loading