-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure of reproducing sparsification of MobileBERT-oBERT-SQuAD #1534
Comments
Hi @absol13 is there a warning about dynamic shape? It looks like no operations are being run in the engine. Please run the deepsparse benchmarking with a static shape for this testing, such as |
Hello. Thanks for your fast response, but it does not seem to be a problem of dynamic shape input. I add another weird benchmark result of the model converted by sparseml.transformers.export_onnx:
As displayed above, fraction_of_supported_ops has value of 0.0 which is contrast with its value close to 1 in the official released model. It seems that deepsparse engine cannot comprehend models generated by sparseml.transformers.export_onnx. |
Thanks for the additional detail @absol13 ! I was able to replicate and isolate the issue in the ONNX post-processing of the export ONNX process. We are working on a fix. In this image, the ONNX from the SparseZoo is on the left and the disfunctional exported model is on the right. There is a single MatMul that wasn't folded properly in the ONNX export. |
Thanks for your fast support. |
Describe the bug
I am trying to reproduce sparsification process of MobileBERT-oBERT-SQuAD model. (zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni)
I trained the base model (zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base-none) with the recipe of 32-epochs as from sparseZoo. But the process of converting sparsified model is missing in the model description, so I used sparseml.transformers.export_onnx tool to convert it to onnx format.
However, our new sparsified model is much slower than the released model in onnx format. In our environment, inference speed of our model is about 7 times slower than the original model. What makes me more confusing was the throughput of the model converted from the uploaded checkpoint of MobileBERT-oBERT-SQuAD was also slower than the released model.
Also, I discovered that the analysis result of our new model from deepsparse.analyze tool is different from that of the released model.
The analysis result of the released model describes the detailed analysis on the wall-time of each layer, but that of our model does not show any model structure and following analysis.
I wonder if there are some missing links in converting sparsity-aware trained model to onnx format.
Also, I discovered another issue complaining about similar case as mine: #1364 .
Expected behavior
The 14layer_pruned50_quant-none-vnni model converted by sparseml.transformers.export_onnx from the officially released checkpoint in sparseZoo should show similar throughput to the released model in onnx format.
Environment
Include all relevant environment information:
To Reproduce
sparseml.transformers.export_onnx --task question-answering --model_path ./ --sequence_length 384
(You may reproduce using the checkpoint at zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni .)
Errors
Analysis result of the newly converted model
Analysis result of the officially released model by deepsparse.analyze
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered: