Stuned at fifth line in cmd when infering, error with ''return mImpl->deserializedCudaEngine(blob, size, nullptr) #1329
Unanswered
brilliant-soilder
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello! The computer converted onnx to engine, but can't read the enginefile to infer the same computer. So puzzled, thanks for help.
This project runs all right in another computer with cuda10.2. When I copied the project to this computer, it shows the above "desrialization problem"
'''##I think the problem comes from this:
mEngine = std::shared_ptrnvinfer1::ICudaEngine(
runtime->deserializeCudaEngine(engineData.data() + ADD_PARAS_NUM, fsize - ADD_PARAS_NUM, nullptr),
samplesCommon::InferDeleter());
'''
When infering ,first shows an error "cudart64_102.dll is needed", So I changed the name cudart64_110.dll --->cudart64_102.dll. So as changed cublas64_11.dll--->cublas64_10.dll, cublasLt64_11.dll--->cublasLt64_10.dll. And next it shows an error with ''return mImpl->deserializedCudaEngine(blob, size, nullptr);''
Environment
**TensorRT Version 8.2.3.0:
**NVIDIA GPU RTX 3060:
**NVIDIA Driver Version 512:
**CUDA Version 11.2:
**CUDNN Version 8.2.1:
**Operating System windows:
**Python Version (if applicable) 3.9:
Tensorflow Version (if applicable):
**PyTorch Version (if applicable) 1.8:
Baremetal or Container (if so, version):
[11/07/2022-17:08:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +416, GPU +0, now: CPU 4141, GPU 2149 (MiB)
[11/07/2022-17:08:48] [I] [TRT] Loaded engine size: 13 MiB
[11/07/2022-17:08:49] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.2 but loaded cuBLAS/cuBLAS LT 11.3.1
[11/07/2022-17:08:49] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +819, GPU +474, now: CPU 4978, GPU 2636 (MiB)
[11/07/2022-17:08:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +687, GPU +266, now: CPU 5665, GPU 2902 (MiB)
TensorRT-8.2.3.0\bin>trtexec.exe --onnx=3_4_512.onnx --saveEngine=3_4_512.engine --workspace=4096 --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8203] # trtexec.exe --onnx=3_4_512.onnx --saveEngine=3_4_512.engine --workspace=4096 --fp16
[11/08/2022-15:23:13] [I] === Model Options ===
[11/08/2022-15:23:13] [I] Format: ONNX
[11/08/2022-15:23:13] [I] Model: 3_4_512.onnx
[11/08/2022-15:23:13] [I] Output:
[11/08/2022-15:23:13] [I] === Build Options ===
[11/08/2022-15:23:13] [I] Max batch: explicit batch
[11/08/2022-15:23:13] [I] Workspace: 4096 MiB
[11/08/2022-15:23:13] [I] minTiming: 1
[11/08/2022-15:23:13] [I] avgTiming: 8
[11/08/2022-15:23:13] [I] Precision: FP32+FP16
[11/08/2022-15:23:13] [I] Calibration:
[11/08/2022-15:23:13] [I] Refit: Disabled
[11/08/2022-15:23:13] [I] Sparsity: Disabled
[11/08/2022-15:23:13] [I] Safe mode: Disabled
[11/08/2022-15:23:13] [I] DirectIO mode: Disabled
[11/08/2022-15:23:13] [I] Restricted mode: Disabled
[11/08/2022-15:23:13] [I] Save engine: 3_4_512.engine
[11/08/2022-15:23:13] [I] Load engine:
[11/08/2022-15:23:13] [I] Profiling verbosity: 0
[11/08/2022-15:23:13] [I] Tactic sources: Using default tactic sources
[11/08/2022-15:23:13] [I] timingCacheMode: local
[11/08/2022-15:23:13] [I] timingCacheFile:
[11/08/2022-15:23:13] [I] Input(s)s format: fp32:CHW
[11/08/2022-15:23:13] [I] Output(s)s format: fp32:CHW
[11/08/2022-15:23:13] [I] Input build shapes: model
[11/08/2022-15:23:13] [I] Input calibration shapes: model
[11/08/2022-15:23:13] [I] === System Options ===
[11/08/2022-15:23:13] [I] Device: 0
[11/08/2022-15:23:13] [I] DLACore:
[11/08/2022-15:23:13] [I] Plugins:
[11/08/2022-15:23:13] [I] === Inference Options ===
[11/08/2022-15:23:13] [I] Batch: Explicit
[11/08/2022-15:23:13] [I] Input inference shapes: model
[11/08/2022-15:23:13] [I] Iterations: 10
[11/08/2022-15:23:13] [I] Duration: 3s (+ 200ms warm up)
[11/08/2022-15:23:13] [I] Sleep time: 0ms
[11/08/2022-15:23:13] [I] Idle time: 0ms
[11/08/2022-15:23:13] [I] Streams: 1
[11/08/2022-15:23:13] [I] ExposeDMA: Disabled
[11/08/2022-15:23:13] [I] Data transfers: Enabled
[11/08/2022-15:23:13] [I] Spin-wait: Disabled
[11/08/2022-15:23:13] [I] Multithreading: Disabled
[11/08/2022-15:23:13] [I] CUDA Graph: Disabled
[11/08/2022-15:23:13] [I] Separate profiling: Disabled
[11/08/2022-15:23:13] [I] Time Deserialize: Disabled
[11/08/2022-15:23:13] [I] Time Refit: Disabled
[11/08/2022-15:23:13] [I] Skip inference: Disabled
[11/08/2022-15:23:13] [I] Inputs:
[11/08/2022-15:23:13] [I] === Reporting Options ===
[11/08/2022-15:23:13] [I] Verbose: Disabled
[11/08/2022-15:23:13] [I] Averages: 10 inferences
[11/08/2022-15:23:13] [I] Percentile: 99
[11/08/2022-15:23:13] [I] Dump refittable layers:Disabled
[11/08/2022-15:23:13] [I] Dump output: Disabled
[11/08/2022-15:23:13] [I] Profile: Disabled
[11/08/2022-15:23:13] [I] Export timing to JSON file:
[11/08/2022-15:23:13] [I] Export output to JSON file:
[11/08/2022-15:23:13] [I] Export profile to JSON file:
[11/08/2022-15:23:13] [I]
[11/08/2022-15:23:13] [I] === Device Information ===
[11/08/2022-15:23:13] [I] Selected Device: NVIDIA GeForce RTX 3060
[11/08/2022-15:23:13] [I] Compute Capability: 8.6
[11/08/2022-15:23:13] [I] SMs: 28
[11/08/2022-15:23:13] [I] Compute Clock Rate: 1.837 GHz
[11/08/2022-15:23:13] [I] Device Global Memory: 12287 MiB
[11/08/2022-15:23:13] [I] Shared Memory per SM: 100 KiB
[11/08/2022-15:23:13] [I] Memory Bus Width: 192 bits (ECC disabled)
[11/08/2022-15:23:13] [I] Memory Clock Rate: 7.501 GHz
[11/08/2022-15:23:13] [I]
[11/08/2022-15:23:13] [I] TensorRT version: 8.2.3
[11/08/2022-15:23:13] [I] [TRT] [MemUsageChange] Init CUDA: CPU +575, GPU +0, now: CPU 8281, GPU 2261 (MiB)
[11/08/2022-15:23:14] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 8348 MiB, GPU 2261 MiB
[11/08/2022-15:23:14] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 8509 MiB, GPU 2305 MiB
[11/08/2022-15:23:14] [I] Start parsing network model
[11/08/2022-15:23:14] [I] [TRT] ----------------------------------------------------------------
[11/08/2022-15:23:14] [I] [TRT] Input filename: 3_4_512.onnx
[11/08/2022-15:23:14] [I] [TRT] ONNX IR version: 0.0.6
[11/08/2022-15:23:14] [I] [TRT] Opset version: 11
[11/08/2022-15:23:14] [I] [TRT] Producer name: pytorch
[11/08/2022-15:23:14] [I] [TRT] Producer version: 1.8
[11/08/2022-15:23:14] [I] [TRT] Domain:
[11/08/2022-15:23:14] [I] [TRT] Model version: 0
[11/08/2022-15:23:14] [I] [TRT] Doc string:
[11/08/2022-15:23:14] [I] [TRT] ----------------------------------------------------------------
[11/08/2022-15:23:14] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/08/2022-15:23:14] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/08/2022-15:23:14] [I] Finish parsing network model
[11/08/2022-15:23:15] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1
[11/08/2022-15:23:15] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +631, GPU +268, now: CPU 9123, GPU 2573 (MiB)
[11/08/2022-15:23:16] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +641, GPU +264, now: CPU 9764, GPU 2837 (MiB)
[11/08/2022-15:23:16] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
Unexpected Internal Error: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::~StdVirtualMemoryBufferImpl::121] Error Code 1: Cuda Runtime (driver shutting down)
^C
C:\Users\tz\Desktop\win-cu11.2\TensorRT-8.2.3.0\bin>^Z^Z^Z
C:\Users\tz\Desktop\win-cu11.2\TensorRT-8.2.3.0\bin>trtexec.exe --onnx=3_4_512.onnx --saveEngine=3_4_512.engine --workspace=2048 --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8203] # trtexec.exe --onnx=3_4_512.onnx --saveEngine=3_4_512.engine --workspace=2048 --fp16
[11/08/2022-15:25:47] [I] === Model Options ===
[11/08/2022-15:25:47] [I] Format: ONNX
[11/08/2022-15:25:47] [I] Model: 3_4_512.onnx
[11/08/2022-15:25:47] [I] Output:
[11/08/2022-15:25:47] [I] === Build Options ===
[11/08/2022-15:25:47] [I] Max batch: explicit batch
[11/08/2022-15:25:47] [I] Workspace: 2048 MiB
[11/08/2022-15:25:47] [I] minTiming: 1
[11/08/2022-15:25:47] [I] avgTiming: 8
[11/08/2022-15:25:47] [I] Precision: FP32+FP16
[11/08/2022-15:25:47] [I] Calibration:
[11/08/2022-15:25:47] [I] Refit: Disabled
[11/08/2022-15:25:47] [I] Sparsity: Disabled
[11/08/2022-15:25:47] [I] Safe mode: Disabled
[11/08/2022-15:25:47] [I] DirectIO mode: Disabled
[11/08/2022-15:25:47] [I] Restricted mode: Disabled
[11/08/2022-15:25:47] [I] Save engine: 3_4_512.engine
[11/08/2022-15:25:47] [I] Load engine:
[11/08/2022-15:25:47] [I] Profiling verbosity: 0
[11/08/2022-15:25:47] [I] Tactic sources: Using default tactic sources
[11/08/2022-15:25:47] [I] timingCacheMode: local
[11/08/2022-15:25:47] [I] timingCacheFile:
[11/08/2022-15:25:47] [I] Input(s)s format: fp32:CHW
[11/08/2022-15:25:47] [I] Output(s)s format: fp32:CHW
[11/08/2022-15:25:47] [I] Input build shapes: model
[11/08/2022-15:25:47] [I] Input calibration shapes: model
[11/08/2022-15:25:47] [I] === System Options ===
[11/08/2022-15:25:47] [I] Device: 0
[11/08/2022-15:25:47] [I] DLACore:
[11/08/2022-15:25:47] [I] Plugins:
[11/08/2022-15:25:47] [I] === Inference Options ===
[11/08/2022-15:25:47] [I] Batch: Explicit
[11/08/2022-15:25:47] [I] Input inference shapes: model
[11/08/2022-15:25:47] [I] Iterations: 10
[11/08/2022-15:25:47] [I] Duration: 3s (+ 200ms warm up)
[11/08/2022-15:25:47] [I] Sleep time: 0ms
[11/08/2022-15:25:47] [I] Idle time: 0ms
[11/08/2022-15:25:47] [I] Streams: 1
[11/08/2022-15:25:47] [I] ExposeDMA: Disabled
[11/08/2022-15:25:47] [I] Data transfers: Enabled
[11/08/2022-15:25:47] [I] Spin-wait: Disabled
[11/08/2022-15:25:47] [I] Multithreading: Disabled
[11/08/2022-15:25:47] [I] CUDA Graph: Disabled
[11/08/2022-15:25:47] [I] Separate profiling: Disabled
[11/08/2022-15:25:47] [I] Time Deserialize: Disabled
[11/08/2022-15:25:47] [I] Time Refit: Disabled
[11/08/2022-15:25:47] [I] Skip inference: Disabled
[11/08/2022-15:25:47] [I] Inputs:
[11/08/2022-15:25:47] [I] === Reporting Options ===
[11/08/2022-15:25:47] [I] Verbose: Disabled
[11/08/2022-15:25:47] [I] Averages: 10 inferences
[11/08/2022-15:25:47] [I] Percentile: 99
[11/08/2022-15:25:47] [I] Dump refittable layers:Disabled
[11/08/2022-15:25:47] [I] Dump output: Disabled
[11/08/2022-15:25:47] [I] Profile: Disabled
[11/08/2022-15:25:47] [I] Export timing to JSON file:
[11/08/2022-15:25:47] [I] Export output to JSON file:
[11/08/2022-15:25:47] [I] Export profile to JSON file:
[11/08/2022-15:25:47] [I]
[11/08/2022-15:25:47] [I] === Device Information ===
[11/08/2022-15:25:47] [I] Selected Device: NVIDIA GeForce RTX 3060
[11/08/2022-15:25:47] [I] Compute Capability: 8.6
[11/08/2022-15:25:47] [I] SMs: 28
[11/08/2022-15:25:47] [I] Compute Clock Rate: 1.837 GHz
[11/08/2022-15:25:47] [I] Device Global Memory: 12287 MiB
[11/08/2022-15:25:47] [I] Shared Memory per SM: 100 KiB
[11/08/2022-15:25:47] [I] Memory Bus Width: 192 bits (ECC disabled)
[11/08/2022-15:25:47] [I] Memory Clock Rate: 7.501 GHz
[11/08/2022-15:25:47] [I]
[11/08/2022-15:25:47] [I] TensorRT version: 8.2.3
[11/08/2022-15:25:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +583, GPU +0, now: CPU 7315, GPU 2261 (MiB)
[11/08/2022-15:25:48] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 7378 MiB, GPU 2261 MiB
[11/08/2022-15:25:48] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 7544 MiB, GPU 2305 MiB
[11/08/2022-15:25:48] [I] Start parsing network model
[11/08/2022-15:25:48] [I] [TRT] ----------------------------------------------------------------
[11/08/2022-15:25:48] [I] [TRT] Input filename: 3_4_512.onnx
[11/08/2022-15:25:48] [I] [TRT] ONNX IR version: 0.0.6
[11/08/2022-15:25:48] [I] [TRT] Opset version: 11
[11/08/2022-15:25:48] [I] [TRT] Producer name: pytorch
[11/08/2022-15:25:48] [I] [TRT] Producer version: 1.8
[11/08/2022-15:25:48] [I] [TRT] Domain:
[11/08/2022-15:25:48] [I] [TRT] Model version: 0
[11/08/2022-15:25:48] [I] [TRT] Doc string:
[11/08/2022-15:25:48] [I] [TRT] ----------------------------------------------------------------
[11/08/2022-15:25:48] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/08/2022-15:25:48] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/08/2022-15:25:48] [I] Finish parsing network model
[11/08/2022-15:25:49] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1
[11/08/2022-15:25:49] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +668, GPU +268, now: CPU 8192, GPU 2573 (MiB)
[11/08/2022-15:25:50] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +599, GPU +264, now: CPU 8791, GPU 2837 (MiB)
[11/08/2022-15:25:50] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/08/2022-15:26:12] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[11/08/2022-15:28:06] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/08/2022-15:28:06] [I] [TRT] Total Host Persistent Memory: 105936
[11/08/2022-15:28:06] [I] [TRT] Total Device Persistent Memory: 12862976
[11/08/2022-15:28:06] [I] [TRT] Total Scratch Memory: 25165824
[11/08/2022-15:28:06] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 11 MiB, GPU 1140 MiB
[11/08/2022-15:28:06] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 24.2226ms to assign 10 blocks to 111 nodes requiring 76486656 bytes.
[11/08/2022-15:28:06] [I] [TRT] Total Activation Memory: 76486656
[11/08/2022-15:28:06] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 9921, GPU 3257 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 9921, GPU 3267 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +11, GPU +13, now: CPU 11, GPU 13 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 9925, GPU 3221 (MiB)
[11/08/2022-15:28:06] [I] [TRT] Loaded engine size: 13 MiB
[11/08/2022-15:28:06] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 9925, GPU 3242 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 9925, GPU 3250 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB)
[11/08/2022-15:28:06] [I] Engine built in 138.683 sec.
[11/08/2022-15:28:06] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 9779, GPU 3202 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 9779, GPU 3210 (MiB)
[11/08/2022-15:28:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +85, now: CPU 0, GPU 97 (MiB)
[11/08/2022-15:28:06] [I] Using random values for input input
[11/08/2022-15:28:06] [I] Created input binding for input with dimensions 3x4x512x512
[11/08/2022-15:28:06] [I] Using random values for output output
[11/08/2022-15:28:06] [I] Created output binding for output with dimensions 3x512x512
[11/08/2022-15:28:06] [I] Starting inference
[11/08/2022-15:28:09] [I] Warmup completed 30 queries over 200 ms
[11/08/2022-15:28:09] [I] Timing trace has 448 queries over 3.01528 s
[11/08/2022-15:28:09] [I]
[11/08/2022-15:28:09] [I] === Trace details ===
[11/08/2022-15:28:09] [I] Trace averages of 10 runs:
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.70958 ms - Host latency: 7.98414 ms (end to end 13.4338 ms, enqueue 0.449416 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.62781 ms - Host latency: 7.90262 ms (end to end 13.2403 ms, enqueue 0.43143 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.69272 ms - Host latency: 7.97968 ms (end to end 13.1704 ms, enqueue 0.437268 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.78547 ms - Host latency: 8.08582 ms (end to end 13.5735 ms, enqueue 0.44975 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.65985 ms - Host latency: 7.95659 ms (end to end 13.2849 ms, enqueue 0.456281 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.75233 ms - Host latency: 8.08286 ms (end to end 13.3064 ms, enqueue 0.5026 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.49004 ms - Host latency: 7.85455 ms (end to end 13.5516 ms, enqueue 0.465564 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.55192 ms - Host latency: 7.88926 ms (end to end 12.9293 ms, enqueue 0.474048 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.66555 ms - Host latency: 7.95674 ms (end to end 13.3547 ms, enqueue 0.481604 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.7468 ms - Host latency: 8.03611 ms (end to end 13.6591 ms, enqueue 0.522687 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.6114 ms - Host latency: 7.88512 ms (end to end 13.1572 ms, enqueue 0.432147 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.76632 ms - Host latency: 8.0464 ms (end to end 13.446 ms, enqueue 0.436206 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.66252 ms - Host latency: 7.95161 ms (end to end 13.3631 ms, enqueue 0.459363 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.60012 ms - Host latency: 7.86749 ms (end to end 13.1617 ms, enqueue 0.430664 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.65146 ms - Host latency: 8.0037 ms (end to end 13.4007 ms, enqueue 0.468982 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.69885 ms - Host latency: 8.04579 ms (end to end 13.3164 ms, enqueue 0.487793 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.50627 ms - Host latency: 7.78765 ms (end to end 12.971 ms, enqueue 0.457983 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.9022 ms - Host latency: 8.25731 ms (end to end 13.7503 ms, enqueue 0.542029 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.74595 ms - Host latency: 8.02666 ms (end to end 13.4747 ms, enqueue 0.467566 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.46039 ms - Host latency: 7.76008 ms (end to end 13.033 ms, enqueue 0.437585 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.6533 ms - Host latency: 7.92222 ms (end to end 13.34 ms, enqueue 0.422009 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.76389 ms - Host latency: 8.02856 ms (end to end 13.4407 ms, enqueue 0.441272 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.52844 ms - Host latency: 7.80363 ms (end to end 12.9902 ms, enqueue 0.426746 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.71169 ms - Host latency: 7.98873 ms (end to end 13.4217 ms, enqueue 0.462207 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.60992 ms - Host latency: 7.88285 ms (end to end 13.2237 ms, enqueue 0.459961 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.56495 ms - Host latency: 7.8339 ms (end to end 13.0414 ms, enqueue 0.424841 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.74716 ms - Host latency: 8.02372 ms (end to end 13.4816 ms, enqueue 0.444177 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.65258 ms - Host latency: 7.92474 ms (end to end 13.2523 ms, enqueue 0.438989 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.5282 ms - Host latency: 7.8012 ms (end to end 12.9826 ms, enqueue 0.431714 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.89451 ms - Host latency: 8.17075 ms (end to end 13.7673 ms, enqueue 0.442261 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.66895 ms - Host latency: 8.00969 ms (end to end 13.2777 ms, enqueue 0.486743 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.50308 ms - Host latency: 7.83325 ms (end to end 13.1489 ms, enqueue 0.433154 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.81689 ms - Host latency: 8.19438 ms (end to end 13.5765 ms, enqueue 0.424048 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.63203 ms - Host latency: 7.9093 ms (end to end 13.1829 ms, enqueue 0.443628 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.49385 ms - Host latency: 7.82375 ms (end to end 12.9902 ms, enqueue 0.439258 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.77444 ms - Host latency: 8.10166 ms (end to end 13.5164 ms, enqueue 0.423999 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.65237 ms - Host latency: 7.9302 ms (end to end 13.2338 ms, enqueue 0.433447 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.53174 ms - Host latency: 7.88328 ms (end to end 13.1508 ms, enqueue 0.430054 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.74668 ms - Host latency: 8.02556 ms (end to end 13.4534 ms, enqueue 0.432739 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.64275 ms - Host latency: 7.91931 ms (end to end 13.2074 ms, enqueue 0.437378 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.44941 ms - Host latency: 7.74397 ms (end to end 13.0299 ms, enqueue 0.42041 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.71943 ms - Host latency: 8.01567 ms (end to end 13.5259 ms, enqueue 0.436865 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.58608 ms - Host latency: 7.87527 ms (end to end 13.1407 ms, enqueue 0.440112 ms)
[11/08/2022-15:28:09] [I] Average on 10 runs - GPU latency: 6.5209 ms - Host latency: 7.7957 ms (end to end 13.0232 ms, enqueue 0.407813 ms)
[11/08/2022-15:28:09] [I]
[11/08/2022-15:28:09] [I] === Performance summary ===
[11/08/2022-15:28:09] [I] Throughput: 148.576 qps
[11/08/2022-15:28:09] [I] Latency: min = 7.0957 ms, max = 10.2825 ms, mean = 7.95778 ms, median = 7.9342 ms, percentile(99%) = 9.98694 ms
[11/08/2022-15:28:09] [I] End-to-End Host Latency: min = 11.6857 ms, max = 15.6642 ms, mean = 13.3058 ms, median = 13.1685 ms, percentile(99%) = 15.4863 ms
[11/08/2022-15:28:09] [I] Enqueue Time: min = 0.322632 ms, max = 0.979004 ms, mean = 0.448781 ms, median = 0.45163 ms, percentile(99%) = 0.662964 ms
[11/08/2022-15:28:09] [I] H2D Latency: min = 0.99585 ms, max = 1.50366 ms, mean = 1.0336 ms, median = 1.00809 ms, percentile(99%) = 1.43237 ms
[11/08/2022-15:28:09] [I] GPU Compute Time: min = 5.83472 ms, max = 9.02856 ms, mean = 6.65907 ms, median = 6.57874 ms, percentile(99%) = 8.62817 ms
[11/08/2022-15:28:09] [I] D2H Latency: min = 0.252686 ms, max = 0.439941 ms, mean = 0.265108 ms, median = 0.256592 ms, percentile(99%) = 0.429443 ms
[11/08/2022-15:28:09] [I] Total Host Walltime: 3.01528 s
[11/08/2022-15:28:09] [I] Total GPU Compute Time: 2.98326 s
[11/08/2022-15:28:09] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/08/2022-15:28:09] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8203] # trtexec.exe --onnx=3_4_512.onnx --saveEngine=3_4_512.engine --workspace=2048 --fp16
Beta Was this translation helpful? Give feedback.
All reactions