Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just one element of a batch is correct in TensorRT 8.6.1.6 #3689

Closed
Beshkent opened this issue Feb 29, 2024 · 11 comments
Closed

Just one element of a batch is correct in TensorRT 8.6.1.6 #3689

Beshkent opened this issue Feb 29, 2024 · 11 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@Beshkent
Copy link

Beshkent commented Feb 29, 2024

Description

Hello!
I have a pipeline, that gets TRT engine from a torch checkpoint, which works fine for Cuda11.4 && TensorRT-7.2.3.4-1.cuda11.1. When I tried to upgrade GPU libs (and TRT engine), I met some strange error during inference using TRT engine: when batch-size > 1, result of the only one element of the batch is correct (AFAIK the first element). Here are the used versions

Installed versions:
tensorrt-8.6.1.6-1.cuda12.0
cuda-toolkit-12-0-12.0.1-1
libcudnn8-devel-8.9.7.29-1.cuda12.2

Upgraded pip packages upto following versions, but didn't help:
nvidia-cublas-cu12        12.3.4.1 (also checked with 12.1.0.26)
onnx                      1.13.1
onnxruntime               1.11.1
torch                     2.0.0

Conversion of onnx (skipping torch->onnx conversion), that is used in TensorRT7, also didn't help

Environment

TensorRT Version: 8.6.1.6-1.cuda12.0
NVIDIA GPU: Tesla V100S
NVIDIA Driver Version: 525.147.05
CUDA Version: 12.0
CUDNN Version: 8.9.7

Operating System: rhel8
Python Version: 3.8
PyTorch Version: 2

Changed just versions of tools/libs, code of conversion (in Python) and inference (in C++) are the same for both TensorRT7 and TensorRT8

Can you help with questions, please?

  • is there some change in the representation of the TRT engine input (way of padding, transposing and so on)
  • prerequisites: minimal versions of tools, that are used in conversion chain torch->onnx->trt
@zerollzeng
Copy link
Collaborator

  1. Does the model has dynamic shapes input?
  2. Can it be reproduce with polygraphy? usage would like polygraphy run model.onnx --trt --onnxrt to compare the output between TensorRT and onnxruntime

Thanks!

@zerollzeng zerollzeng self-assigned this Mar 4, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Mar 4, 2024
@Beshkent
Copy link
Author

Beshkent commented Mar 4, 2024

Does the model has dynamic shapes input?

Yes, 0th index. Marked as minShapes && optShapes && maxShapes

Can it be reproduce with polygraphy? usage would like polygraphy run model.onnx --trt --onnxrt to compare the output between TensorRT and onnxruntime

Needs pip tensorrt package and polygraphy binary, which I don't have. Will try to install and post the result

@OliviaSnail
Copy link

OliviaSnail commented Mar 6, 2024

I also encountered the same error! My trt engine works well in TensorRT7.1/TensorRT8.5, but not in TensorRT8.6.... I also use dynamic shapes inputs, and use multi-context in different threads. When batch=1, the result is correct. When batch > 1, the all results are wrong.

@Beshkent
Copy link
Author

Beshkent commented Apr 8, 2024

@zerollzeng
polygraphy run model.onnx --trt --onnxrt returns Difference exceeds tolerance (rel=1e-05, abs=1e-05). Attaching log: polygraphy.log

@zerollzeng
Copy link
Collaborator

The diff look good(<1e-5) to me, the reason why it fails is polygraphy use a strict tolerance for the output diff((rel=1e-05, abs=1e-05))

[I]         Error Metrics: qf0_logits
[I]             Minimum Required Tolerance: elemwise error | [abs=5.4359e-05] OR [rel=0.00030196] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=4.0124e-06, std-dev=5.6803e-06, var=3.2265e-11, median=2.1458e-06, min=0 at (0, 2, 3, 2), max=5.4359e-05 at (3, 14, 5, 0), avg-magnitude=4.0124e-06
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (0       , 5.44e-06) |       2795 | ########################################
                    (5.44e-06, 1.09e-05) |        516 | #######
                    (1.09e-05, 1.63e-05) |        163 | ##
                    (1.63e-05, 2.17e-05) |         62 | 
                    (2.17e-05, 2.72e-05) |         26 | 
                    (2.72e-05, 3.26e-05) |         14 | 
                    (3.26e-05, 3.81e-05) |          4 | 
                    (3.81e-05, 4.35e-05) |          5 | 
                    (4.35e-05, 4.89e-05) |          8 | 
                    (4.89e-05, 5.44e-05) |          7 | 

@Beshkent
Copy link
Author

So, then do you have any ideas why batching doesn't work? Does random in the network may cause such error? In our nets we have random, which is moved as input in this polygraphy run

@Beshkent
Copy link
Author

This problem exists in inference of two different networks. Attaching onnx file of one of them and a command, that we used to convert

trtexec --onnx=full_text.onnx --saveEngine=full_text.trt  --minShapes=text_emb:1x1x192,text_mask:1x1,q_labels:1x1x5,bert_emb:1x1x768,bert_mask:1x1,speaker_ids:1,noise_scale_w:1,length_scale:1x1 \
    --optShapes=text_emb:16x400x192,text_mask:16x400,q_labels:16x400x5,bert_emb:16x200x768,bert_mask:16x200,speaker_ids:16,noise_scale_w:16,length_scale:16x400 \
    --maxShapes=text_emb:16x400x192,text_mask:16x400,q_labels:16x400x5,bert_emb:16x200x768,bert_mask:16x200,speaker_ids:16,noise_scale_w:16,length_scale:16x400 \
    --fp16

full_text_random.onnx.zip

@Beshkent
Copy link
Author

@zerollzeng
Here is the graph of the second NN. Tried to attach the NN model, but its size is larger than the limit (getting File size too big: 25 MB are allowed, 41 MB were attempted to upload.)

decoder_random onnx

How we prepare the inputs for inference:
- shape[1] of next inputs may be different for each element of the batch, so they are all padded (with 0) upto kMaxAxisValue: attention_weights, attention_weights_cum, processed_memory, encoder_outputs
- padding_mask_zeros_inf[i] is filled with 0 for the first batch[i].shape[1] positions, and with inf for rest (kMaxAxisValue - d[1])
- padding_mask_ones_zeros[i]: the same as above, put padd values are 1 and 0s```

@zerollzeng
Copy link
Collaborator

This problem exists in inference of two different networks. Attaching onnx file of one of them and a command, that we used to convert

trtexec --onnx=full_text.onnx --saveEngine=full_text.trt  --minShapes=text_emb:1x1x192,text_mask:1x1,q_labels:1x1x5,bert_emb:1x1x768,bert_mask:1x1,speaker_ids:1,noise_scale_w:1,length_scale:1x1 \
    --optShapes=text_emb:16x400x192,text_mask:16x400,q_labels:16x400x5,bert_emb:16x200x768,bert_mask:16x200,speaker_ids:16,noise_scale_w:16,length_scale:16x400 \
    --maxShapes=text_emb:16x400x192,text_mask:16x400,q_labels:16x400x5,bert_emb:16x200x768,bert_mask:16x200,speaker_ids:16,noise_scale_w:16,length_scale:16x400 \
    --fp16

full_text_random.onnx.zip

I did a quick check with this model, it passed with polygraphy

[I]     Comparing Output: 'attn_mask' (dtype=int64, shape=(16, 1, 371, 400)) with 'attn_mask' (dtype=int64, shape=(16, 1, 371, 400))
[I]         Tolerance: [abs=0.0001, rel=0.0001] | Checking elemwise error
/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/util/array.py:677: RuntimeWarning: invalid value encountered in divide
  "numpy": lambda lhs, rhs: lhs / rhs,
[I]         trt-runner-N0-04/27/24-14:05:22: attn_mask | Stats: mean=0.33135, std-dev=0.4707, var=0.22156, median=0, min=0 at (0, 0, 0, 0), max=1 at (0, 0, 0, 2), avg-magnitude=0.33135
[I]         onnxrt-runner-N0-04/27/24-14:05:22: attn_mask | Stats: mean=0.33135, std-dev=0.4707, var=0.22156, median=0, min=0 at (0, 0, 0, 0), max=1 at (0, 0, 0, 2), avg-magnitude=0.33135
[I]         Error Metrics: attn_mask
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=nan] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0, 0), max=0 at (0, 0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=nan, std-dev=nan, var=nan, median=nan, min=nan at (0, 0, 0, 0), max=nan at (0, 0, 0, 0), avg-magnitude=nan
[I]         PASSED | Output: 'attn_mask' | Difference is within tolerance (rel=0.0001, abs=0.0001)
[I]     PASSED | All outputs matched | Outputs: ['m_p', 'logs_p', 'w_ceil', 'attn_mask']
[I] Accuracy Summary | trt-runner-N0-04/27/24-14:05:22 vs. onnxrt-runner-N0-04/27/24-14:05:22 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 147.548s | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run full_text_random.onnx --trt --trt-opt-shapes text_emb:[16,400,192] text_mask:[16,400] q_labels:[16,400,5] bert_emb:[16,200,768] bert_mask:[16,200] speaker_ids:[16] noise_scale_w:[16] length_scale:[16,400] --trt-min-shapes text_emb:[1,1,192] text_mask:[1,1] q_labels:[1,1,5] bert_emb:[1,1,768] bert_mask:[1,1] speaker_ids:[1] noise_scale_w:[1] length_scale:[1,1] --trt-max-shapes text_emb:[16,400,192] text_mask:[16,400] q_labels:[16,400,5] bert_emb:[16,200,768] bert_mask:[16,200] speaker_ids:[16] noise_scale_w:[16] length_scale:[16,400] --onnxrt --input-shapes text_emb:[16,400,192] text_mask:[16,400] q_labels:[16,400,5] bert_emb:[16,200,768] bert_mask:[16,200] speaker_ids:[16] noise_scale_w:[16] length_scale:[16,400] --atol 1e-4 --rtol 1e-4

@zerollzeng
Copy link
Collaborator

Since they are good at TRT 7, I guess some api usage error may lead to this, maybe check the TRT 8 release note?

@ttyio
Copy link
Collaborator

ttyio commented Jul 2, 2024

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

@ttyio ttyio closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants