-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorrt engine of fp32 precision lost accuracy compared with onnx #3700
Comments
TRT doesn't guarantee to provide bit-wise accuracy to other framework due to optimization and floating-point error. I guess in you case the accuracy drop happen early and amplify by the pow/matmal layer and it propagate through the network. |
Would be great if you can:
|
I test the trt-fp32 generated by Polygraphy(mark all) which broke all of the op fusions, and the result mAP is still 66.1, so I think the accuracy drop is due to the float64 precision truncation, but the situation is so uncommon. |
The model is large(400MB), how can I provide it to you? |
from ultralytics import RTDETR
model = RTDETR('rtdetr-l.pt')
model.export(format='torchscript')
model.export(format='onnx') |
Are you using A10? @chinakook , have you tried disable TF32? |
I'll try that |
@ttyio No effect on Ampere device.
I have already tested it in the 4 combinations on an Ampere device as below (all with fp16 on):
No one of the results of these can be matched with fp32 on an Ampere device. Therefore, I believe that there may be some bugs when using Ampere GPUs to run TensorRT and generate FP16 rtdetr models (especially the transformer layer and the layernorm layer). This has led to poor precision in FP16 results on this card, which significantly deviates from the performance observed in FP32. |
I have also encountered the accuracy drop of dinov2-rtdetr with fp16 precision in Orin platform, and found out that the problematic layers are the self-attentons of RTDETR-decoder. You can see #3657. But in this Issue, my resnet-rtdetr model in fp32 precision have accuracy drop of 4.6 comapred with onnx, I have no idea how to solve this problem. |
@miraiaroha I used opset 17 (the ultralytics default version), and the fp32 precision is matched. The onnx must deploy in fp32 mode (without amp or without model.half()), otherwise It will cause parse error as you mentioned #3567 . |
Usually fp32 is so slowly, why specify that certain layers use fp32 and others use fp16 considering accuracy and speed ? |
It's a balance between accuracy and speed. Usually -4% accuracy drop is unacceptable. |
@miraiaroha you can mark all node of onnx except real output nodes (add output node for those nodes) to close trt fusion), and |
I have tried to mark all nodes as the outputing nodes to disable the op fusion, and the mAP is still 66.1 in fp32 precision. |
Hi we just release TRT 10 EA, could you please try the new version? If the accuracy still bad, you can provide a reproduce and we can take a further check. Thanks! |
Yes, I have tried TRT 10 EA, I found that it lose more accuracy... |
I have found the cause of the problem which the preprocessing method of image is not aligned, is not related to the quantization. Thank you for your patience. |
I think you are not using Ampere card. My Turing cards is all OK on fp32/fp16. |
I think we can close this issue as @miraiaroha has got a solution. Please reopen #3652 to track Ampere accuracy lose issue. |
Description
I evaluted the my detection model (resnet101-rtdetr), and found that there is a significant loss of accuracy of fp32, as below:
model | onnx | trt-fp32 | trt-fp16 | trt-fp16-int8
mAP | 70.7 | 66.1 | 65.8 | 65.4
And I use Polygraphy(mark all) to compare all outputs of layers, FAILED layers showed below:
step3-op16-fp32.txt
The problematic layers are mostly MatMul_output and Pow_output
Are there any methods to save the accuracy?
Environment
TensorRT Version: 8.6
Polygraphy Version: 0.49
The text was updated successfully, but these errors were encountered: