Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while converting ONNX model with explicit quantization to TensorRT engine #2604

Closed
cricossc opened this issue Jan 16, 2023 · 9 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@cricossc
Copy link

cricossc commented Jan 16, 2023

Description

Hello,

while trying to convert an already quantized ONNX model with Q/DQ nodes to an int8 TensorRT engine, I get the following error:

[E] Error[2]: [weightConvertors.cpp::transformWeightsIfFP::537] Error Code 2: Internal Error (Assertion w.type() == type || (w.type() == DataType::kINT8 || w.type() == DataType::kHALF || w.type() == DataType::kFLOAT) failed. )
[E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[E] Engine could not be created from network
[E] Building engine failed
[E] Failed to create engine from model or file.
[E] Engine set up failed

Can you tell me the origin of this error?

Environment

TensorRT Version: 8.5.1
NVIDIA GPU: GeForce RTX 2070
NVIDIA Driver Version: 510.47.03
CUDA Version: 11.6
CUDNN Version: 8.4.1
Operating System: Ubuntu 20.04

Relevant Files

mobilenetv2-trt-ready.zip

trtexec_logs.txt

Steps To Reproduce

trtexec --onnx=mobilenetv2-trt-ready.onnx --int8 --saveEngine=mobilenetv2.trt
@zerollzeng
Copy link
Collaborator

I think you don't need these Q/DQ for the output tensor, can you try remove them and try again? @ttyio please correct me if I'm wrong :-D
image

@zerollzeng zerollzeng self-assigned this Jan 17, 2023
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jan 17, 2023
@cricossc
Copy link
Author

Hi @zerollzeng, I removed the suggested Q/DQ nodes at the end of the network, but it still raises the same error.

The convolution's weights are already quantized to int8 values but stored as FP32 data type, as mentioned by the documentation. Should I do it differently?

@zerollzeng
Copy link
Collaborator

Somehow this onnx is invalid to onnxruntime, can you fix this first?

onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from mobilenetv2-trt-ready.onnx failed:This is an invalid model. Type Error: Type 'tensor(int32)' of input parameter

image

@cricossc
Copy link
Author

Many thanks @zerollzeng, I fixed the issue by storing the biases as FP32 and I am now able to convert the ONNX model to a TensorRT engine.

However, when evaluating the produced engine using TensorRT on Imagenet samples, the model's accuracy drops to 0.
Indeed, the output values are out of the normal range and almost equal to each other, and when performing inference with the ONNX model I get the same inconsistent outputs.

Should I keep the Q/DQ nodes after the convolution layers or remove them as suggested in the documentation? I would be glad to have your opinions on the matter.

@zerollzeng
Copy link
Collaborator

You need to make sure the exported onnx has good accuracy. then you can know if it's caused by TRT.

@cricossc
Copy link
Author

I managed to make it work with weights and biases stored as FP32 dequantized values.
Many thanks for your help.

@zerollzeng
Copy link
Collaborator

You are welcome, I'm closing this issue, feel free to reopen it if you have any further questions.

@ninono12345
Copy link

Hello, @cricossc
I am new to onnx, can you please share with me what did you do to fix the models accuracy, because I am also having problems with accuracy. How did you change from int32 to fp32? did you modyfy the pytorch model, then covert it to onnx, or perhaps you used onnx graph surgeon or some other app?

Thank you

@thakur-sachin
Copy link

Many thanks @zerollzeng, I fixed the issue by storing the biases as FP32 and I am now able to convert the ONNX model to a TensorRT engine.

However, when evaluating the produced engine using TensorRT on Imagenet samples, the model's accuracy drops to 0. Indeed, the output values are out of the normal range and almost equal to each other, and when performing inference with the ONNX model I get the same inconsistent outputs.

Should I keep the Q/DQ nodes after the convolution layers or remove them as suggested in the documentation? I would be glad to have your opinions on the matter.

Hi, I am struggling with a similar issue where I pre-processed my model, converted it to QDQ model in onnx and want to convert this quantized model to int8 trt engine for inference. I am getting the following error: A DequantizeLayer can only run in DataType::kINT8, DataType::kFP8 or DataType::kINT4 precision. I am not sure how can I edit the layers and change the data type in onnx quantization process to do the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants