You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:
On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder. ~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize
Can anyone please help how to include the stats.h5 during inference using espnet_onnx
The text was updated successfully, but these errors were encountered:
Hi @anirpipi, sorry for the late reply, and thank you for reporting the issue.
It may be a bug, so I would like to check this problem.
It seems you are using your own trained model, can you confirm that this issue still happens with the published models? If it's reproducible, I will download the model and investigate this.
Hi..Thanks for the response.
Its the same case with pre-trained models also..
For VITS, its fine but for FastSpeech2+PWG, the problem occurs..
Can you please look into it once
Thanks in advance
Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:
########################### ONNX Conversion ############################
from espnet2.bin.tts_inference import Text2Speech
from espnet_onnx.export import TTSModelExport
m = TTSModelExport()
tag_exp = "exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth"
train_config="exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/config.yaml"
vocoder_tag = 'parallel_wavegan.v1/checkpoint-400000steps.pkl'
vocoder_config= 'parallel_wavegan.v1/config.yml'
text2speech = Text2Speech.from_pretrained(
train_config=train_config,
model_file=tag_exp,
vocoder_file=vocoder_tag,
vocoder_config=vocoder_config,
speed_control_alpha=1.0,
always_fix_seed=False
)
tag_name = 'ljspeech_pretrained'
m.export(text2speech, tag_name, quantize=True)
########################### Inference ############################
from espnet_onnx import Text2Speech
import soundfile
import numpy as np
import time
text2speech = Text2Speech(tag_name)
text = 'hello world!'
wav = wav['wav']
soundfile.write("ljspeech_pretrained_test.wav", wav, 22050, "PCM_16")
######################################################################
On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder.
~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize
Can anyone please help how to include the stats.h5 during inference using espnet_onnx
The text was updated successfully, but these errors were encountered: