Can't run pretrained librimix SOT pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp model #81

SaddamAnnais · 2024-12-10T10:04:05Z

Hi ESPnet team,

Thank you so much for creating this amazing package, it has been a huge help for my study.

I'm having trouble running a pretrained model using the espnet zoo model, specifically the one from https://github.com/espnet/espnet/tree/master/egs2/librimix/sot_asr1. The model I'm trying to use is https://huggingface.co/espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp.

I've tried two approaches to run the model:

Approach 1: Using method from espnet_model_zoo README.md

I've matched the environment to the one mentioned on Hugging Face:

python : 3.8.13
espnet: 202211
pytorch: 1.12.1

Then, I ran:

model = Speech2Text.from_pretrained(
    "espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp"
)

speech, rate = soundfile.read("1_overlapped_sound.wav")
text, *_ = model(speech)[0]
print(text)

Approach 2: Using the method mentioned on their Hugging Face

I've followed the instructions on the Hugging Face model page:

cd espnet
git checkout fe824770250485b77c68e8ca041922b8779b5c94
pip install -e .
cd egs2/librimix/sot_asr1
./run.sh --skip_data_prep false --skip_train true --download_model espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp

I modified the script a bit, which is to --skip_data_prep false. I skipped the data prep step because I want to just run and test the model.

Then, I ran:

config = "exp/espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp/config.yaml"
ckpt = "exp/espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp/valid.acc.ave_10best.pth"
model = Speech2Text(config, ckpt)

speech, rate = soundfile.read("1_overlapped_sound.wav")
text, *_ = model(speech)[0]
print(text)

Both approaches result in the same error:

/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet2/layers/stft.py:164: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  olens = (ilens - self.n_fft) // self.hop_length + 1
Traceback (most recent call last):
  File "main.py", line 9, in <module>
    text, *_ = model(speech)[0]
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet2/bin/asr_inference.py", line 377, in __call__
    results = self._decode_single_sample(enc[0])
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet2/bin/asr_inference.py", line 415, in _decode_single_sample
    nbest_hyps = self.beam_search(
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet/nets/beam_search.py", line 361, in forward
    running_hyps = self.init_hyp(x)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet/nets/batch_beam_search.py", line 119, in init_hyp
    init_states[k] = d.batch_init_state(x)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet/nets/scorers/ctc.py", line 96, in batch_init_state
    logp = self.ctc.log_softmax(x.unsqueeze(0))  # assuming batch_size = 1
AttributeError: 'NoneType' object has no attribute 'log_softmax'

I'm not sure what's causing this error. Any help would be greatly appreciated!

The text was updated successfully, but these errors were encountered:

sw005320 · 2024-12-10T12:27:35Z

Thanks for the report.
Hmm, this looks strange, and we may have some compatibility issues.

@pengchengguo, can you take a look at this?

pengchengguo · 2024-12-10T13:48:52Z

Hi @SaddamAnnais,

We do not use CTC loss during the SOT model training, as indicated in the configuration file: https://github.com/espnet/espnet/blob/master/egs2/librimix/sot_asr1/conf/tuning/train_sot_asr_conformer.yaml#L35.

Therefore, self.CTC should be None when loading a pre-trained model and conducting inference.
I am not sure why it continues to compute the log_softmax results; it may be a compatibility issue. I am trying to find this problem.

pengchengguo · 2024-12-11T02:29:36Z

Hi @SaddamAnnais,

As I mentioned earlier, the SOT model does not include a CTC module.
Therefore, we should set some parameters accordingly when initializing a Speech2Text instance (https://github.com/espnet/espnet/blob/master/espnet2/bin/asr_inference.py#L69), for example:

  speech2text = Speech2Text.from_pretrained(
      model_tag=model_tag,
      ctc_weight=0.0,
      lm_weight=0.0,
      ngram_weight=0.0,
      penalty=0.0,
  )

If we train a SOT model from scratch, these parameters will be automatically assigned through the inference configuration file, refer to: https://github.com/espnet/espnet/blob/master/egs2/librimix/sot_asr1/run_whisper_sot.sh#L12.

SaddamAnnais · 2024-12-22T04:34:15Z

Hi team. Sorry for the late response. Yes now I can run it. Thank you very much

SaddamAnnais closed this as completed Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't run pretrained librimix SOT pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp model #81

Can't run pretrained librimix SOT pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp model #81

SaddamAnnais commented Dec 10, 2024

sw005320 commented Dec 10, 2024

pengchengguo commented Dec 10, 2024

pengchengguo commented Dec 11, 2024 •

edited

Loading

SaddamAnnais commented Dec 22, 2024

Can't run pretrained librimix SOT pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp model #81

Can't run pretrained librimix SOT pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp model #81

Comments

SaddamAnnais commented Dec 10, 2024

sw005320 commented Dec 10, 2024

pengchengguo commented Dec 10, 2024

pengchengguo commented Dec 11, 2024 • edited Loading

SaddamAnnais commented Dec 22, 2024

pengchengguo commented Dec 11, 2024 •

edited

Loading