-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the multi_speaker implemention. #37
Comments
I just expended the symbol table and each of the symbol offset represents one speaker as implicit embedding. |
Thanks for your reply! I understand what you have done. I think this implemention may introduce unnecessary trouble if I want to preserve the prosody of the reference utterence (from speaker A) and have the timbre of speaker B. Do you know some other implemention of multi_speaker tacotron ? |
Well, this project does not implement the prosody memory of speakers. In other words, the prosody of every speaker are independent with each other. If you want to refer the prosody of other speakers, extra explicit prosody embedding is needed. Unfortunately current implementation on deep learning is not perfect enough for this issue as I know. Global style token (aka. GST) on Tacotron is one kind of it. A good project on PyTorch is provided and it is based on Tacotron 1 though. I do not know if this project suits you. |
Thanks for your kind help ! I've read this GST project before, it only uses the dataset of a single speaker. My issue is that a multi_speaker tacotron is needed where the speaker embedding is explicitly given 🤣 Anyway, I'll try to implement it. |
Hi, I read about your multi_speaker implemention of Tacotron2. It means different speakers correspond to different text inputs, and you did not use the speaker embedding. Am i right ? If so, the speaker information is involved in the text which is unnecessary.
The text was updated successfully, but these errors were encountered: