Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terrible TensorRT accuracy when running inference on an object tracking algorithm #3609

Closed
ninono12345 opened this issue Jan 18, 2024 · 5 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@ninono12345
Copy link

ninono12345 commented Jan 18, 2024

Description

I am trying to convert the inference part of the pytracking tomp101 algorithm to tensorrt

I've converted it to onnx, the inference seems to be fine, the bounding box catches on correctly ALTHOUGH comparing the difference between output tensors of the original model and onnx model using this code seem to differ by quite a lot (the tensor values seem to differ, but tracks objects just fine :D ):

{sample_x [dtype=float32, shape=(1, 1024, 18, 18)],
train_samples [dtype=float32, shape=(1, 1024, 18, 18)],
target_labels [dtype=float32, shape=(1, 1, 18, 18)],
train_ltrb [dtype=float32, shape=(1, 4, 18, 18)]}
[I] trt-runner-N0-01/18/24-04:45:38
---- Inference Output(s) ----
{bbreg_test_feat_enc [dtype=float32, shape=(1, 1, 256, 18, 18)],
bbreg_weights [dtype=float32, shape=(1, 256, 1, 1)],
target_scores [dtype=float32, shape=(1, 1, 18, 18)]}

r1 = original_model(inputs)
r2 = session.run(inputs)

avg11=avg11+(torch.mean(torch.abs(r1[0] - torch.from_numpy(r2[0]).cuda())))
avg12=avg12+(torch.mean(torch.abs(r1[1] - torch.from_numpy(r2[1]).cuda())))
avg13=avg13+(torch.mean(torch.abs(r1[2] - torch.from_numpy(r2[2]).cuda())))

print(avg11/30)
print(avg12/30)
print(avg13/30)

BUT

when the model is converted to TensorRT the accuracy drops!!! inference is terrible.

Does anybody have any suggestions on how to improve it? Maybe should I modify the onnx model with graph surgeon? maybe theres some polygraphy tool that I could use?

Maybe there is a trtexec method of converting that preserves accuracy?

THANK YOU

Environment

TensorRT Version: 8.6

NVIDIA GPU: GTX 1660 Ti

NVIDIA Driver Version: 546.01

CUDA Version: 12.1

CUDNN Version: 8.9.7

Operating System:

Python Version (if applicable): 3.10.13

PyTorch Version (if applicable): 2.1.2+cu121

Baremetal or Container (if so, version): no environment

Relevant Files

Model link: https://drive.google.com/file/d/1rKmrrktevdtL9Namevg3XdpMXWjTc3Gv/view?usp=sharing

@zerollzeng
Copy link
Collaborator

Could you please check this quickly with polygraphy? The usage would be like polygraphy run model.onnx --trt --onnxrt

You can feed real input too, see https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/run/05_comparing_with_custom_input_data

@zerollzeng zerollzeng self-assigned this Jan 19, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jan 19, 2024
@ninono12345
Copy link
Author

Thank you @zerollzeng, here is the answer:

D:\pyth\pytracking-master>polygraphy run modified_latest3_sanitized2.onnx --trt --onnxrt
[I] RUNNING | Command: C:\Users\Tomas\AppData\Local\Programs\Python\Python310\Scripts\polygraphy run modified_latest3_sanitized2.onnx --trt --onnxrt
[I] trt-runner-N0-01/19/24-12:30:50 | Activating and starting inference
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[W] Input tensor: sample_x (dtype=DataType.FLOAT, shape=(-1, 1024, 18, 18)) | No shapes provided; Will use shape: [1, 1024, 18, 18] for min/opt/max in profile.
[W] This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[W] Input tensor: train_samples (dtype=DataType.FLOAT, shape=(-1, 1024, 18, 18)) | No shapes provided; Will use shape: [1, 1024, 18, 18] for min/opt/max in profile.
[W] Input tensor: target_labels (dtype=DataType.FLOAT, shape=(-1, 1, 18, 18)) | No shapes provided; Will use shape: [1, 1, 18, 18] for min/opt/max in profile.
[W] Input tensor: train_ltrb (dtype=DataType.FLOAT, shape=(-1, 4, 18, 18)) | No shapes provided; Will use shape: [1, 4, 18, 18] for min/opt/max in profile.
[I] Configuring with profiles:[
Profile 0:
{sample_x [min=[1, 1024, 18, 18], opt=[1, 1024, 18, 18], max=[1, 1024, 18, 18]],
train_samples [min=[1, 1024, 18, 18], opt=[1, 1024, 18, 18], max=[1, 1024, 18, 18]],
target_labels [min=[1, 1, 18, 18], opt=[1, 1, 18, 18], max=[1, 1, 18, 18]],
train_ltrb [min=[1, 4, 18, 18], opt=[1, 4, 18, 18], max=[1, 4, 18, 18]]}
]
[I] Building engine with configuration:
Flags | []
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 6143.69 MiB, TACTIC_DRAM: 6143.69 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[I] Finished engine building in 39.848 seconds
[I] trt-runner-N0-01/19/24-12:30:50
---- Inference Input(s) ----
{sample_x [dtype=float32, shape=(1, 1024, 18, 18)],
train_samples [dtype=float32, shape=(1, 1024, 18, 18)],
target_labels [dtype=float32, shape=(1, 1, 18, 18)],
train_ltrb [dtype=float32, shape=(1, 4, 18, 18)]}
[I] trt-runner-N0-01/19/24-12:30:50
---- Inference Output(s) ----
{bbreg_test_feat_enc [dtype=float32, shape=(1, 1, 256, 18, 18)],
bbreg_weights [dtype=float32, shape=(1, 256, 1, 1)],
target_scores [dtype=float32, shape=(1, 1, 18, 18)]}
[I] trt-runner-N0-01/19/24-12:30:50 | Completed 1 iteration(s) in 185.7 ms | Average inference time: 185.7 ms.
[I] onnxrt-runner-N0-01/19/24-12:30:50 | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-01/19/24-12:30:50
---- Inference Input(s) ----
{sample_x [dtype=float32, shape=(1, 1024, 18, 18)],
train_samples [dtype=float32, shape=(1, 1024, 18, 18)],
target_labels [dtype=float32, shape=(1, 1, 18, 18)],
train_ltrb [dtype=float32, shape=(1, 4, 18, 18)]}
[I] onnxrt-runner-N0-01/19/24-12:30:50
---- Inference Output(s) ----
{target_scores [dtype=float32, shape=(1, 1, 18, 18)],
bbreg_test_feat_enc [dtype=float32, shape=(1, 1, 256, 18, 18)],
bbreg_weights [dtype=float32, shape=(1, 256, 1, 1)]}
[I] onnxrt-runner-N0-01/19/24-12:30:50 | Completed 1 iteration(s) in 242.9 ms | Average inference time: 242.9 ms.
[I] Accuracy Comparison | trt-runner-N0-01/19/24-12:30:50 vs. onnxrt-runner-N0-01/19/24-12:30:50
[I] Comparing Output: 'bbreg_test_feat_enc' (dtype=float32, shape=(1, 1, 256, 18, 18)) with 'bbreg_test_feat_enc' (dtype=float32, shape=(1, 1, 256, 18, 18))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-01/19/24-12:30:50: bbreg_test_feat_enc | Stats: mean=0.012191, std-dev=0.45428, var=0.20637, median=-0.011814, min=-3.197 at (0, 0, 92, 0, 17), max=5.5255 at (0, 0, 125, 14, 9), avg-magnitude=0.11202
[I] onnxrt-runner-N0-01/19/24-12:30:50: bbreg_test_feat_enc | Stats: mean=0.012191, std-dev=0.45428, var=0.20637, median=-0.011814, min=-3.197 at (0, 0, 92, 0, 17), max=5.5255 at (0, 0, 125, 14, 9), avg-magnitude=0.11202
[I] Error Metrics: bbreg_test_feat_enc
[I] Minimum Required Tolerance: elemwise error | [abs=2.861e-06] OR [rel=1.0297] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=7.755e-08, std-dev=1.0897e-07, var=1.1875e-14, median=5.5879e-08, min=0 at (0, 0, 0, 0, 0), max=2.861e-06 at (0, 0, 125, 6, 1), avg-magnitude=7.755e-08
[I] Relative Difference | Stats: mean=2.8524e-05, std-dev=0.0041397, var=1.7137e-05, median=1.1497e-06, min=0 at (0, 0, 0, 0, 0), max=1.0297 at (0, 0, 163, 14, 12), avg-magnitude=2.8524e-05
[I] PASSED | Output: 'bbreg_test_feat_enc' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I] Comparing Output: 'bbreg_weights' (dtype=float32, shape=(1, 256, 1, 1)) with 'bbreg_weights' (dtype=float32, shape=(1, 256, 1, 1))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-01/19/24-12:30:50: bbreg_weights | Stats: mean=0.016638, std-dev=0.27469, var=0.075457, median=-6.1898e-05, min=-0.029308 at (0, 75, 0, 0), max=4.4027 at (0, 232, 0, 0), avg-magnitude=0.019857
[I] onnxrt-runner-N0-01/19/24-12:30:50: bbreg_weights | Stats: mean=0.016638, std-dev=0.27469, var=0.075457, median=-6.1894e-05, min=-0.029308 at (0, 75, 0, 0), max=4.4027 at (0, 232, 0, 0), avg-magnitude=0.019857
[I] Error Metrics: bbreg_weights
[I] Minimum Required Tolerance: elemwise error | [abs=2.7765e-08] OR [rel=0.00084576] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=1.0353e-08, std-dev=5.4175e-09, var=2.9349e-17, median=1.071e-08, min=0 at (0, 232, 0, 0), max=2.7765e-08 at (0, 58, 0, 0), avg-magnitude=1.0353e-08
[I] Relative Difference | Stats: mean=1.8881e-05, std-dev=5.9611e-05, var=3.5535e-09, median=5.0967e-06, min=0 at (0, 232, 0, 0), max=0.00084576 at (0, 228, 0, 0), avg-magnitude=1.8881e-05
[I] PASSED | Output: 'bbreg_weights' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I] Comparing Output: 'target_scores' (dtype=float32, shape=(1, 1, 18, 18)) with 'target_scores' (dtype=float32, shape=(1, 1, 18, 18))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-01/19/24-12:30:50: target_scores | Stats: mean=0.016897, std-dev=0.016446, var=0.00027048, median=0.010701, min=0.00054304 at (0, 0, 13, 9), max=0.092922 at (0, 0, 2, 15), avg-magnitude=0.016897
[I] onnxrt-runner-N0-01/19/24-12:30:50: target_scores | Stats: mean=0.016897, std-dev=0.016446, var=0.00027048, median=0.010701, min=0.00054308 at (0, 0, 13, 9), max=0.092922 at (0, 0, 2, 15), avg-magnitude=0.016897
[I] Error Metrics: target_scores
[I] Minimum Required Tolerance: elemwise error | [abs=1.9372e-07] OR [rel=0.00015655] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=6.085e-08, std-dev=3.7659e-08, var=1.4182e-15, median=5.9372e-08, min=0 at (0, 0, 4, 0), max=1.9372e-07 at (0, 0, 2, 13), avg-magnitude=6.085e-08
[I] Relative Difference | Stats: mean=9.9251e-06, std-dev=1.6575e-05, var=2.7473e-10, median=4.1957e-06, min=0 at (0, 0, 4, 0), max=0.00015655 at (0, 0, 17, 10), avg-magnitude=9.9251e-06
[I] PASSED | Output: 'target_scores' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I] PASSED | All outputs matched | Outputs: ['bbreg_test_feat_enc', 'bbreg_weights', 'target_scores']
[I] Accuracy Summary | trt-runner-N0-01/19/24-12:30:50 vs. onnxrt-runner-N0-01/19/24-12:30:50 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 60.464s | Command: C:\Users\Tomas\AppData\Local\Programs\Python\Python310\Scripts\polygraphy run modified_latest3_sanitized2.onnx --trt --onnxrt

accuracy seems to pass the tests, but when I insert the tensorrt model inside the code, the tracking box in is all over the place on the webcam feed.
I assume, that this particular model needs to be explicitly accurate to function...

When I run the model with onnxruntime inside my code, the tracking seems fine, but when I switch to run the engine, everything is not fine... If you want I can record a video sample, of how the model runs on pytorch, on onnxruntime and trt

I ran polygraphy inspect model modified_latest3_sanitized2.onnx --show layers attrs weights and noticed that many many layers are using int64, BUT those layers have 0 tensors, all layers with weights are float32 . Can this be an accuracy issue?

Polygraphy inspect log file modified_latest_3_sanitized2_inspect.txt

@zerollzeng
Copy link
Collaborator

accuracy seems to pass the tests, but when I insert the tensorrt model inside the code, the tracking box in is all over the place on the webcam feed.
I assume, that this particular model needs to be explicitly accurate to function...

Did you sync after each inference? or a bug in the pre or post processing?

I ran polygraphy inspect model modified_latest3_sanitized2.onnx --show layers attrs weights and noticed that many many layers are using int64, BUT those layers have 0 tensors, all layers with weights are float32 . Can this be an accuracy issue?

int64 weights should be good.

@ninono12345
Copy link
Author

@zerollzeng What do you mean by this question?

Did you sync after each inference? or a bug in the pre or post processing?

@ninono12345
Copy link
Author

I was using polygraphys inference, everything was fixed, when I transfered tensors from gpu to cpu before inference!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants