-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while converting ONNX model with explicit quantization to TensorRT engine #2604
Comments
I think you don't need these Q/DQ for the output tensor, can you try remove them and try again? @ttyio please correct me if I'm wrong :-D |
Hi @zerollzeng, I removed the suggested Q/DQ nodes at the end of the network, but it still raises the same error. The convolution's weights are already quantized to int8 values but stored as FP32 data type, as mentioned by the documentation. Should I do it differently? |
Many thanks @zerollzeng, I fixed the issue by storing the biases as FP32 and I am now able to convert the ONNX model to a TensorRT engine. However, when evaluating the produced engine using TensorRT on Imagenet samples, the model's accuracy drops to 0. Should I keep the Q/DQ nodes after the convolution layers or remove them as suggested in the documentation? I would be glad to have your opinions on the matter. |
You need to make sure the exported onnx has good accuracy. then you can know if it's caused by TRT. |
I managed to make it work with weights and biases stored as FP32 dequantized values. |
You are welcome, I'm closing this issue, feel free to reopen it if you have any further questions. |
Hello, @cricossc Thank you |
Hi, I am struggling with a similar issue where I pre-processed my model, converted it to QDQ model in onnx and want to convert this quantized model to int8 trt engine for inference. I am getting the following error: A DequantizeLayer can only run in DataType::kINT8, DataType::kFP8 or DataType::kINT4 precision. I am not sure how can I edit the layers and change the data type in onnx quantization process to do the same. |
Description
Hello,
while trying to convert an already quantized ONNX model with Q/DQ nodes to an int8 TensorRT engine, I get the following error:
Can you tell me the origin of this error?
Environment
TensorRT Version: 8.5.1
NVIDIA GPU: GeForce RTX 2070
NVIDIA Driver Version: 510.47.03
CUDA Version: 11.6
CUDNN Version: 8.4.1
Operating System: Ubuntu 20.04
Relevant Files
mobilenetv2-trt-ready.zip
trtexec_logs.txt
Steps To Reproduce
The text was updated successfully, but these errors were encountered: