How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

anirpipi · 2023-07-01T07:32:20Z

Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:

########################### ONNX Conversion ############################

from espnet2.bin.tts_inference import Text2Speech
from espnet_onnx.export import TTSModelExport

m = TTSModelExport()

tag_exp = "exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth"
train_config="exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/config.yaml"

vocoder_tag = 'parallel_wavegan.v1/checkpoint-400000steps.pkl'
vocoder_config= 'parallel_wavegan.v1/config.yml'

text2speech = Text2Speech.from_pretrained(
train_config=train_config,
model_file=tag_exp,
vocoder_file=vocoder_tag,
vocoder_config=vocoder_config,
speed_control_alpha=1.0,
always_fix_seed=False
)

tag_name = 'ljspeech_pretrained'
m.export(text2speech, tag_name, quantize=True)

########################### Inference ############################

from espnet_onnx import Text2Speech
import soundfile
import numpy as np
import time

text2speech = Text2Speech(tag_name)

text = 'hello world!'
wav = wav['wav']

soundfile.write("ljspeech_pretrained_test.wav", wav, 22050, "PCM_16")

######################################################################

On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder.
~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize

Can anyone please help how to include the stats.h5 during inference using espnet_onnx

Masao-Someki · 2023-07-20T14:09:38Z

Hi @anirpipi, sorry for the late reply, and thank you for reporting the issue.
It may be a bug, so I would like to check this problem.
It seems you are using your own trained model, can you confirm that this issue still happens with the published models? If it's reproducible, I will download the model and investigate this.

anirpipi · 2023-07-27T07:13:31Z

Hi..Thanks for the response.
Its the same case with pre-trained models also..
For VITS, its fine but for FastSpeech2+PWG, the problem occurs..
Can you please look into it once
Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

anirpipi commented Jul 1, 2023

Masao-Someki commented Jul 20, 2023

anirpipi commented Jul 27, 2023

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

Comments

anirpipi commented Jul 1, 2023

Masao-Someki commented Jul 20, 2023

anirpipi commented Jul 27, 2023