ONNX model takes too long to run #7129

divyansh2681 · 2023-07-28T22:02:32Z

I have exported a NeMo ASR model to ONNX. When I run the ONNX model for inference, it takes much much more time as compared to the original .nemo model. I am running both of them on the same machine in the same conda environment.

titu1994 · 2023-07-29T00:36:42Z

What model are you trying ? If it's RNNT, then onnx is indeed a bit slower.

divyansh2681 · 2023-07-31T16:34:13Z

I am using stt_en_fastconformer_transducer_xlarge. Yes, it's RNNT.

divyansh2681 · 2023-07-31T19:13:13Z

What model are you trying ? If it's RNNT, then onnx is indeed a bit slower.

Also, does the model performance change after exporting to ONNX?

titu1994 · 2023-07-31T19:22:10Z

Onnx inference does not have bound inputs on gpu, so there is too much CPU GPU transfer causing it to slow down. The exported model is itself not slower by any significant measure.

Pytorch/torchscript can keep all tensors on gpu so if appears faster

divyansh2681 · 2023-07-31T19:45:03Z

Okay, that makes sense. Also, I found out that using TensorRT provider was making the inference slower so I changed it to CUDA. When I am exporting a large model from nemo to onnx, I am getting separate files for encoder, decoder and weights. Is there any way to get a single file? I asked this question on another issue (#6759) as well but I didn't get how to resolve it.

titu1994 · 2023-07-31T19:57:02Z

For RNNT, it's not possible to get single onnx file. RNNT encoder needs to run only once for a given sample, decoder and joint need to run autoregressively multiple times to produce tokens.

divyansh2681 · 2023-07-31T21:16:37Z

Oh okay, so to use the onnx models of encoder and decoder for inference, how do we figure out the number of times decoder should run? Also, exporting nemo to onnx generates weight and bias files as well, how are they to be used with onnxruntime?

titu1994 · 2023-07-31T22:59:27Z

Nemo model when exported should generate two onnx file not weights (PT) file. As to how many times to run - it's dynamic and takes logic to figure out stopping condition. So you'll need to use the code in the export script to find out what token to stop at.

divyansh2681 · 2023-07-31T23:34:20Z

I used the export script on the nemo model. I got around 370 files in total. The encoder and decoder are two of them. I have attached the picture of how my file directory looks after exporting.

Do you think I am doing something wrong while exporting?

titu1994 · 2023-07-31T23:51:50Z

RNNT models aren't exported with that script. You should use the one in examples ASR export

titu1994 · 2023-07-31T23:53:54Z

Actually this script should also work. @borisfom any idea what's up here ?
Still you should use the following script only for RNNT https://github.com/NVIDIA/NeMo/tree/main/examples/asr/export/transducer

Either onnx or transducer should work. But not hybrid models, we have to figure out how to export hybrid models properly

divyansh2681 · 2023-08-04T22:09:41Z

Thank you for your help @titu1994!

I have another question: can I fine tune an existing model to recognize a set of specific words/terms? If yes, how much data is needed? I want the model to recognize some terms related to healthcare/medicine that are not used as frequently as others.

titu1994 · 2023-08-05T06:18:00Z

You can do that with very low LR finetuning, or with adapters. See adapter tutorial in ASR tutorial section

divyansh2681 · 2023-08-07T19:36:12Z

I will check it out. Thank you!

divyansh2681 · 2023-08-23T16:50:50Z

Hello @titu1994, does nemo have a C++ API or something similar? I want to deploy nemo models in a C++ based production environment for inference. I have tried converting a nemo model into ONNX and then using it for inference. But in this case, I had to convert the preprocessing/postprocessing classes/code stubs from nemo source code into C++ and it wasn't very efficient.

titu1994 · 2023-08-23T19:50:19Z

Actually NeMo preprocessor can be exported to onnx via Torchaudio backend. It requires a few extra steps but it should be supported quite well - 1:1 input output correspondence.

It was contributed by a user here - #5512

Seems we never documented this in Nemo docs for some reason. I will fix that soon.

So when preprocessor is exportable, you can then simply run the full pipeline in C++

titu1994 · 2023-08-23T19:55:33Z

Ah it's not onnx but instead torchscript export. Still, there is a c++ API for that so it should be ok I think ?

divyansh2681 · 2023-08-23T20:09:38Z

Ah it's not onnx but instead torchscript export. Still, there is a c++ API for that so it should be ok I think ?

You mean converting the preprocessor from torchscript to onnx and then using it? Or is it something else?

divyansh2681 · 2023-08-23T20:40:12Z

Okay, this makes sense, I'll check it out. Thank you!

titu1994 · 2023-08-23T20:47:02Z

I meant using the torchscript c++ backend for the preprocessor and then onnx or TS backend for the model

divyansh2681 · 2023-08-31T19:09:10Z

I used the torchscript c++ backend for the preprocessor and onnx backend for inference. The preprocessor outputs are signal data and sample rate. In 50% cases, the signal data has all 'NaN'. But the preprocessor runs perfectly when used with python.

I am loading audio using torchaudio in python and using AudioFile (https://github.com/adamstark/AudioFile) in c++.

I tried using torchaudio::sox::load_audio_file (https://github.com/pytorch/audio/blob/main/torchaudio/csrc/sox/io.cpp) but this gives 0's in the preprocessor output (when reading the signal data as a 2D vector, alternate rows have all 0 elements).

Is this error related to the preprocessor?

titu1994 · 2023-08-31T19:20:04Z

Hmm dunno about that. From the tests in that PR, the processor gives exact same output as Torchaudio processor

divyansh2681 · 2023-08-31T19:31:06Z

Okay, I'll figure something out. Thank you!

divyansh2681 · 2023-08-31T19:38:57Z

I found the issue - the output of torchaudio.load/torchaudio::sox::load_audio_file is signal data, sample rate. I interpreted it as signal data, signal length. It works properly now. Thank you.

nabil6391 · 2023-10-31T15:53:10Z

@divyansh2681 Can you please suggest what files you needed converted to c++, or if you have any steps that would greatly help me. Thank you

divyansh2681 · 2023-10-31T16:10:53Z

Hey @nabil6391, NeMo has some python scripts to extract the preprocessor and vocabulary files for the model. Once you have the preprocessor and postprocessor, you can use ONNXruntime C++ library to run inference.

titu1994 closed this as completed Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX model takes too long to run #7129

ONNX model takes too long to run #7129

divyansh2681 commented Jul 28, 2023

titu1994 commented Jul 29, 2023

divyansh2681 commented Jul 31, 2023 •

edited

Loading

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Aug 4, 2023

titu1994 commented Aug 5, 2023

divyansh2681 commented Aug 7, 2023 •

edited

Loading

divyansh2681 commented Aug 23, 2023

titu1994 commented Aug 23, 2023 •

edited

Loading

titu1994 commented Aug 23, 2023

divyansh2681 commented Aug 23, 2023 •

edited

Loading

divyansh2681 commented Aug 23, 2023

titu1994 commented Aug 23, 2023

divyansh2681 commented Aug 31, 2023 •

edited

Loading

titu1994 commented Aug 31, 2023

divyansh2681 commented Aug 31, 2023

divyansh2681 commented Aug 31, 2023

nabil6391 commented Oct 31, 2023

divyansh2681 commented Oct 31, 2023

ONNX model takes too long to run #7129

ONNX model takes too long to run #7129

Comments

divyansh2681 commented Jul 28, 2023

titu1994 commented Jul 29, 2023

divyansh2681 commented Jul 31, 2023 • edited Loading

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

titu1994 commented Jul 31, 2023

divyansh2681 commented Aug 4, 2023

titu1994 commented Aug 5, 2023

divyansh2681 commented Aug 7, 2023 • edited Loading

divyansh2681 commented Aug 23, 2023

titu1994 commented Aug 23, 2023 • edited Loading

titu1994 commented Aug 23, 2023

divyansh2681 commented Aug 23, 2023 • edited Loading

divyansh2681 commented Aug 23, 2023

titu1994 commented Aug 23, 2023

divyansh2681 commented Aug 31, 2023 • edited Loading

titu1994 commented Aug 31, 2023

divyansh2681 commented Aug 31, 2023

divyansh2681 commented Aug 31, 2023

nabil6391 commented Oct 31, 2023

divyansh2681 commented Oct 31, 2023

divyansh2681 commented Jul 31, 2023 •

edited

Loading

divyansh2681 commented Aug 7, 2023 •

edited

Loading

titu1994 commented Aug 23, 2023 •

edited

Loading

divyansh2681 commented Aug 23, 2023 •

edited

Loading

divyansh2681 commented Aug 31, 2023 •

edited

Loading