int8/uint8/bool as input not support in trt plugin ? #3959

lix19937 · 2024-06-24T01:07:14Z

Description

If one of model inputs is int8(like index, not by calib), other's inputs are float or int32, when I use

trtexec --onnx=./sca.onnx --plugins=./libplugin_custom.so --verbose --inputIOFormats=fp32:chw,fp32:chw,int8:chw \
--outputIOFormats=int32:chw,int32:chw,fp32:chw,fp32:chw,fp32:chw  

or  

trtexec --onnx=./sca.onnx --plugins=./libplugin_custom.so --verbose --inputIOFormats=fp16:chw,fp16:chw,int8:chw \
--outputIOFormats=int32:chw,int32:chw,fp16:chw,fp16:chw,fp16:chw  --fp16

error from trtexec

[06/22/2024-21:01:54] [E] Error[9]: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (/SCA_IndexRebatch_TRT: could not find any supported formats consistent with input/output data types)
[06/22/2024-21:01:54] [E] Error[2]: [builder.cpp::buildSerializedNetwork::743] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Also, I found that as long as there is an int8 type in the input, calibration will be triggered, user need override scale with scale and zero-point. But in reality case, there are some inputs that are originally int8 and do not require quantization.
Here I use int8 type as index data type, just for vectorized access. Improve bandwidth utilization and reduce the number of thread creations. So int8/uint8/bool as input not support in trt plugin ? If not support, I think it not very reasonable.

From https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin_v2.html#af502120d140358f8d8139ade5005b9f5

Warning

for the format field, the values PluginFormat::kCHW4, PluginFormat::kCHW16, and PluginFormat::kCHW32 will not be passed in, this is to keep backward compatibility with TensorRT 5.x series. Use PluginV2IOExt or PluginV2DynamicExt for other PluginFormats.
DataType:kBOOL and DataType::kUINT8 are not supported.

@ttyio @zerollzeng

Environment

TensorRT Version:8611

NVIDIA GPU:Orin-X

NVIDIA Driver Version:

CUDA Version:11.4

CUDNN Version:11.6

Operating System:ubbuntu2004

PyTorch Version (if applicable):1.13

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-06-24T01:11:25Z

loop @brb-nv

ttyio · 2024-06-24T23:57:01Z

@lix19937 , TRT only have quantized INT8, no vanilla INT8 today. And you are right, only quantized int8 is supported in plugin. bool/uint8/vanilla int8 are not supported in plugin today.

For your case, besides call setDynamicRange(-128, 127), you can also use Q/DQ in your network, using bellow pattern:

      Q (scale 1) -> plugin(int8 input)

No calibration needed for both workarounds.

ttyio · 2024-06-25T00:04:42Z

@lix19937 , since you have pattern:

   input -> plugin

And the plugin implementation is blackbox to trt, So we could also workaround this by pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8.

lix19937 · 2024-06-25T05:41:40Z

Thanks very much @ttyio

use Q/DQ in your network

This need add one mul layer before plugin

pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8.

Just CAST const void* const* inputs from void* to int8_t* in enqueue, maybe more efficient.

lix19937 · 2024-06-25T10:04:02Z

Similar case #1792 (comment)

ttyio · 2024-06-25T23:04:02Z

Thanks very much @ttyio

use Q/DQ in your network

This need add one mul layer before plugin

pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8.

Just CAST const void* const* inputs from void* to int8_t* in enqueue, maybe more efficient.

This should works! similar in the bert plugin, the mask input in fused mha actually not type kINT32, but we use kINT32 because the unfused version has kINT32 mask.

https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/embLayerNormPlugin/embLayerNormPlugin.cpp#L408
https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/bertQKVToContextPlugin/qkvToContext.cu#L706

lix19937 · 2024-10-24T02:08:42Z

Thanks. @ttyio Now closed.

lix19937 · 2024-10-24T05:31:24Z

inut8/uint8 output #3026

lix19937 mentioned this issue Sep 18, 2024

Onnx to tensorrt conversion for input layer: cast uint8 to fp32 #4131

Open

lix19937 closed this as completed Oct 24, 2024

lix19937 mentioned this issue Dec 16, 2024

[Feature request] allow uint8 output without an ICastLayer before #4282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int8/uint8/bool as input not support in trt plugin ? #3959

int8/uint8/bool as input not support in trt plugin ? #3959

lix19937 commented Jun 24, 2024 •

edited

Loading

Warning

lix19937 commented Jun 24, 2024

ttyio commented Jun 24, 2024 •

edited

Loading

ttyio commented Jun 25, 2024

lix19937 commented Jun 25, 2024 •

edited

Loading

lix19937 commented Jun 25, 2024 •

edited

Loading

ttyio commented Jun 25, 2024

lix19937 commented Oct 24, 2024

lix19937 commented Oct 24, 2024

int8/uint8/bool as input not support in trt plugin ? #3959

int8/uint8/bool as input not support in trt plugin ? #3959

Comments

lix19937 commented Jun 24, 2024 • edited Loading

Description

Warning

Environment

lix19937 commented Jun 24, 2024

ttyio commented Jun 24, 2024 • edited Loading

ttyio commented Jun 25, 2024

lix19937 commented Jun 25, 2024 • edited Loading

lix19937 commented Jun 25, 2024 • edited Loading

ttyio commented Jun 25, 2024

lix19937 commented Oct 24, 2024

lix19937 commented Oct 24, 2024

lix19937 commented Jun 24, 2024 •

edited

Loading

ttyio commented Jun 24, 2024 •

edited

Loading

lix19937 commented Jun 25, 2024 •

edited

Loading

lix19937 commented Jun 25, 2024 •

edited

Loading