Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run pretrained librimix SOT pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp model #81

Closed
SaddamAnnais opened this issue Dec 10, 2024 · 4 comments

Comments

@SaddamAnnais
Copy link

Hi ESPnet team,

Thank you so much for creating this amazing package, it has been a huge help for my study.

I'm having trouble running a pretrained model using the espnet zoo model, specifically the one from https://github.com/espnet/espnet/tree/master/egs2/librimix/sot_asr1. The model I'm trying to use is https://huggingface.co/espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp.

I've tried two approaches to run the model:

Approach 1: Using method from espnet_model_zoo README.md

I've matched the environment to the one mentioned on Hugging Face:

  • python : 3.8.13
  • espnet: 202211
  • pytorch: 1.12.1

Then, I ran:

model = Speech2Text.from_pretrained(
    "espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp"
)

speech, rate = soundfile.read("1_overlapped_sound.wav")
text, *_ = model(speech)[0]
print(text)

Approach 2: Using the method mentioned on their Hugging Face

I've followed the instructions on the Hugging Face model page:

cd espnet
git checkout fe824770250485b77c68e8ca041922b8779b5c94
pip install -e .
cd egs2/librimix/sot_asr1
./run.sh --skip_data_prep false --skip_train true --download_model espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp

I modified the script a bit, which is to --skip_data_prep false. I skipped the data prep step because I want to just run and test the model.

Then, I ran:

config = "exp/espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp/config.yaml"
ckpt = "exp/espnet/pengcheng_librimix_asr_train_sot_asr_conformer_raw_en_char_sp/valid.acc.ave_10best.pth"
model = Speech2Text(config, ckpt)

speech, rate = soundfile.read("1_overlapped_sound.wav")
text, *_ = model(speech)[0]
print(text)

Both approaches result in the same error:

/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet2/layers/stft.py:164: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  olens = (ilens - self.n_fft) // self.hop_length + 1
Traceback (most recent call last):
  File "main.py", line 9, in <module>
    text, *_ = model(speech)[0]
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet2/bin/asr_inference.py", line 377, in __call__
    results = self._decode_single_sample(enc[0])
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet2/bin/asr_inference.py", line 415, in _decode_single_sample
    nbest_hyps = self.beam_search(
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet/nets/beam_search.py", line 361, in forward
    running_hyps = self.init_hyp(x)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet/nets/batch_beam_search.py", line 119, in init_hyp
    init_states[k] = d.batch_init_state(x)
  File "/usr/local/envs/espnet_env/lib/python3.8/site-packages/espnet/nets/scorers/ctc.py", line 96, in batch_init_state
    logp = self.ctc.log_softmax(x.unsqueeze(0))  # assuming batch_size = 1
AttributeError: 'NoneType' object has no attribute 'log_softmax'

I'm not sure what's causing this error. Any help would be greatly appreciated!

@sw005320
Copy link
Collaborator

Thanks for the report.
Hmm, this looks strange, and we may have some compatibility issues.

@pengchengguo, can you take a look at this?

@pengchengguo
Copy link

Hi @SaddamAnnais,

We do not use CTC loss during the SOT model training, as indicated in the configuration file: https://github.com/espnet/espnet/blob/master/egs2/librimix/sot_asr1/conf/tuning/train_sot_asr_conformer.yaml#L35.

Therefore, self.CTC should be None when loading a pre-trained model and conducting inference.
I am not sure why it continues to compute the log_softmax results; it may be a compatibility issue. I am trying to find this problem.

@pengchengguo
Copy link

pengchengguo commented Dec 11, 2024

Hi @SaddamAnnais,

As I mentioned earlier, the SOT model does not include a CTC module.
Therefore, we should set some parameters accordingly when initializing a Speech2Text instance (https://github.com/espnet/espnet/blob/master/espnet2/bin/asr_inference.py#L69), for example:

  speech2text = Speech2Text.from_pretrained(
      model_tag=model_tag,
      ctc_weight=0.0,
      lm_weight=0.0,
      ngram_weight=0.0,
      penalty=0.0,
  )

If we train a SOT model from scratch, these parameters will be automatically assigned through the inference configuration file, refer to: https://github.com/espnet/espnet/blob/master/egs2/librimix/sot_asr1/run_whisper_sot.sh#L12.

@SaddamAnnais
Copy link
Author

Hi team. Sorry for the late response. Yes now I can run it. Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants