Error while converting ONNX model with explicit quantization to TensorRT engine #2604

cricossc · 2023-01-16T17:21:33Z

Description

Hello,

while trying to convert an already quantized ONNX model with Q/DQ nodes to an int8 TensorRT engine, I get the following error:

[E] Error[2]: [weightConvertors.cpp::transformWeightsIfFP::537] Error Code 2: Internal Error (Assertion w.type() == type || (w.type() == DataType::kINT8 || w.type() == DataType::kHALF || w.type() == DataType::kFLOAT) failed. )
[E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[E] Engine could not be created from network
[E] Building engine failed
[E] Failed to create engine from model or file.
[E] Engine set up failed

Can you tell me the origin of this error?

Environment

TensorRT Version: 8.5.1
NVIDIA GPU: GeForce RTX 2070
NVIDIA Driver Version: 510.47.03
CUDA Version: 11.6
CUDNN Version: 8.4.1
Operating System: Ubuntu 20.04

Relevant Files

mobilenetv2-trt-ready.zip

trtexec_logs.txt

Steps To Reproduce

trtexec --onnx=mobilenetv2-trt-ready.onnx --int8 --saveEngine=mobilenetv2.trt

The text was updated successfully, but these errors were encountered:

zerollzeng · 2023-01-17T13:47:35Z

I think you don't need these Q/DQ for the output tensor, can you try remove them and try again? @ttyio please correct me if I'm wrong :-D

cricossc · 2023-01-17T14:08:49Z

Hi @zerollzeng, I removed the suggested Q/DQ nodes at the end of the network, but it still raises the same error.

The convolution's weights are already quantized to int8 values but stored as FP32 data type, as mentioned by the documentation. Should I do it differently?

zerollzeng · 2023-01-19T05:53:36Z

Somehow this onnx is invalid to onnxruntime, can you fix this first?

onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from mobilenetv2-trt-ready.onnx failed:This is an invalid model. Type Error: Type 'tensor(int32)' of input parameter

cricossc · 2023-02-13T15:31:25Z

Many thanks @zerollzeng, I fixed the issue by storing the biases as FP32 and I am now able to convert the ONNX model to a TensorRT engine.

However, when evaluating the produced engine using TensorRT on Imagenet samples, the model's accuracy drops to 0.
Indeed, the output values are out of the normal range and almost equal to each other, and when performing inference with the ONNX model I get the same inconsistent outputs.

Should I keep the Q/DQ nodes after the convolution layers or remove them as suggested in the documentation? I would be glad to have your opinions on the matter.

zerollzeng · 2023-02-15T12:49:00Z

You need to make sure the exported onnx has good accuracy. then you can know if it's caused by TRT.

cricossc · 2023-02-15T12:59:32Z

I managed to make it work with weights and biases stored as FP32 dequantized values.
Many thanks for your help.

zerollzeng · 2023-02-18T14:32:47Z

You are welcome, I'm closing this issue, feel free to reopen it if you have any further questions.

ninono12345 · 2024-01-21T14:18:13Z

Hello, @cricossc
I am new to onnx, can you please share with me what did you do to fix the models accuracy, because I am also having problems with accuracy. How did you change from int32 to fp32? did you modyfy the pytorch model, then covert it to onnx, or perhaps you used onnx graph surgeon or some other app?

Thank you

thakur-sachin · 2024-07-26T15:58:25Z

Many thanks @zerollzeng, I fixed the issue by storing the biases as FP32 and I am now able to convert the ONNX model to a TensorRT engine.

However, when evaluating the produced engine using TensorRT on Imagenet samples, the model's accuracy drops to 0. Indeed, the output values are out of the normal range and almost equal to each other, and when performing inference with the ONNX model I get the same inconsistent outputs.

Should I keep the Q/DQ nodes after the convolution layers or remove them as suggested in the documentation? I would be glad to have your opinions on the matter.

Hi, I am struggling with a similar issue where I pre-processed my model, converted it to QDQ model in onnx and want to convert this quantized model to int8 trt engine for inference. I am getting the following error: A DequantizeLayer can only run in DataType::kINT8, DataType::kFP8 or DataType::kINT4 precision. I am not sure how can I edit the layers and change the data type in onnx quantization process to do the same.

zerollzeng self-assigned this Jan 17, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Jan 17, 2023

zerollzeng closed this as completed Feb 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while converting ONNX model with explicit quantization to TensorRT engine #2604

Error while converting ONNX model with explicit quantization to TensorRT engine #2604

cricossc commented Jan 16, 2023 •

edited

Loading

zerollzeng commented Jan 17, 2023

cricossc commented Jan 17, 2023

zerollzeng commented Jan 19, 2023

cricossc commented Feb 13, 2023

zerollzeng commented Feb 15, 2023

cricossc commented Feb 15, 2023

zerollzeng commented Feb 18, 2023

ninono12345 commented Jan 21, 2024

thakur-sachin commented Jul 26, 2024

Error while converting ONNX model with explicit quantization to TensorRT engine #2604

Error while converting ONNX model with explicit quantization to TensorRT engine #2604

Comments

cricossc commented Jan 16, 2023 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented Jan 17, 2023

cricossc commented Jan 17, 2023

zerollzeng commented Jan 19, 2023

cricossc commented Feb 13, 2023

zerollzeng commented Feb 15, 2023

cricossc commented Feb 15, 2023

zerollzeng commented Feb 18, 2023

ninono12345 commented Jan 21, 2024

thakur-sachin commented Jul 26, 2024

cricossc commented Jan 16, 2023 •

edited

Loading