-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Wav2vec2 for Transformers optimizer (fusion) #10622
base: main
Are you sure you want to change the base?
Conversation
@philschmid Thanks for your contribution! Great job! |
Thanks for the response @wangyems. So if i understand you correctly, you suggest creating a |
I suggest handling these small differences you mentioned in onnx_model_bart so that in the class of onnx_model_bart both bart and Wav2vec are supported. |
@wangyems i added it to the |
What do we need to do to get the CI running and merged? |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
Azure Pipelines successfully started running 10 pipeline(s). |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 6 pipeline(s). |
@philschmid Thanks a lot for this contribution! https://github.com/microsoft/onnxruntime/blob/8255ecbfb4511d5d998a9edac814dfbadf3bb13f/onnxruntime/python/tools/transformers/README.md should probably be updated before merging. |
I added |
I can confirm attention fusion works for e.g. wav2vec2-base-960h. I did some performance measurements on CPU and I am not seeing any notable performance difference between the unoptimized and optimized model variants. I do see a much reduced node graph in Netron when comparing the two models, though. Is the purpose of attention fusion purely to reduce the complexity of the graph or should there be a measureable performance improvement during inference? |
Attention fusion fails when enabling extended optimizations, though (
Not sure if that is to be expected. |
Hi all, is there any update on this? |
What does this PR do?
This PR adds an
onnx_model_wav2vec2.py
file to enable fusion optimization support for Wav2Vec2 models in Hugging Face Transformers. It also updates theoptimizer.py
and adds themodel_type
to it.I made the changes based on the latest PR for turing:
To enable support for
wav2vec2
i copied theonnx_model_bart
and made the required changes for wav2vec2. I wasn't sure if that's the right way to add support for a new model or not. Please let me know if we should do it differently. Since theonnx_model_wav2vec2
andonnx_model_bart
are pretty similar except for some checks in theEncoderAttention
. See below the diffs fromonnx_model_bart
. [Line 28-...]Here is how i tested it.