You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That is a good question. We want to see how this model works on Chinese, but the core problem is not about model or dataset; it is about the speech tokens. Since in the paper we use vq-wav2vec, which is only trained on English Librispeech corpus, we don't expect it to generalize very well to Chinese. We need to find another token which contains limited timbre information and enough prosody information for Chinese, which seems a bit hard. Training a vq-wav2vec on Chinese dataset is also a larger project. Hence, we would not train this on Chinese unless there is a satisfactory speech token ready to use.
Nevertheless, the language restriction is only on the source speech. For the target reference, any language is feasible (i.e. no problem from English content to Chinese speaker).
would consider support Chinese?
The text was updated successfully, but these errors were encountered: