Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int8/uint8/bool as input not support in trt plugin ? #3959

Closed
lix19937 opened this issue Jun 24, 2024 · 8 comments
Closed

int8/uint8/bool as input not support in trt plugin ? #3959

lix19937 opened this issue Jun 24, 2024 · 8 comments

Comments

@lix19937
Copy link

lix19937 commented Jun 24, 2024

Description

If one of model inputs is int8(like index, not by calib), other's inputs are float or int32, when I use

trtexec --onnx=./sca.onnx --plugins=./libplugin_custom.so --verbose --inputIOFormats=fp32:chw,fp32:chw,int8:chw \
--outputIOFormats=int32:chw,int32:chw,fp32:chw,fp32:chw,fp32:chw  

or  

trtexec --onnx=./sca.onnx --plugins=./libplugin_custom.so --verbose --inputIOFormats=fp16:chw,fp16:chw,int8:chw \
--outputIOFormats=int32:chw,int32:chw,fp16:chw,fp16:chw,fp16:chw  --fp16    

error from trtexec

[06/22/2024-21:01:54] [E] Error[9]: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (/SCA_IndexRebatch_TRT: could not find any supported formats consistent with input/output data types)
[06/22/2024-21:01:54] [E] Error[2]: [builder.cpp::buildSerializedNetwork::743] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Also, I found that as long as there is an int8 type in the input, calibration will be triggered, user need override scale with scale and zero-point. But in reality case, there are some inputs that are originally int8 and do not require quantization.
Here I use int8 type as index data type, just for vectorized access. Improve bandwidth utilization and reduce the number of thread creations. So int8/uint8/bool as input not support in trt plugin ? If not support, I think it not very reasonable.

From https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin_v2.html#af502120d140358f8d8139ade5005b9f5

Warning

for the format field, the values PluginFormat::kCHW4, PluginFormat::kCHW16, and PluginFormat::kCHW32 will not be passed in, this is to keep backward compatibility with TensorRT 5.x series. Use PluginV2IOExt or PluginV2DynamicExt for other PluginFormats.
DataType:kBOOL and DataType::kUINT8 are not supported.

@ttyio @zerollzeng

Environment

TensorRT Version:8611

NVIDIA GPU:Orin-X

NVIDIA Driver Version:

CUDA Version:11.4

CUDNN Version:11.6

Operating System:ubbuntu2004

PyTorch Version (if applicable):1.13

@lix19937
Copy link
Author

loop @brb-nv

@ttyio
Copy link
Collaborator

ttyio commented Jun 24, 2024

@lix19937 , TRT only have quantized INT8, no vanilla INT8 today. And you are right, only quantized int8 is supported in plugin. bool/uint8/vanilla int8 are not supported in plugin today.

For your case, besides call setDynamicRange(-128, 127), you can also use Q/DQ in your network, using bellow pattern:

      Q (scale 1) -> plugin(int8 input)

No calibration needed for both workarounds.

@ttyio
Copy link
Collaborator

ttyio commented Jun 25, 2024

@lix19937 , since you have pattern:

   input -> plugin

And the plugin implementation is blackbox to trt, So we could also workaround this by pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8.

@lix19937
Copy link
Author

lix19937 commented Jun 25, 2024

Thanks very much @ttyio

use Q/DQ in your network

This need add one mul layer before plugin

pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8.

Just CAST const void* const* inputs from void* to int8_t* in enqueue, maybe more efficient.

@lix19937
Copy link
Author

lix19937 commented Jun 25, 2024

Similar case #1792 (comment)

@ttyio
Copy link
Collaborator

ttyio commented Jun 25, 2024

Thanks very much @ttyio

use Q/DQ in your network

This need add one mul layer before plugin

pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8.

Just CAST const void* const* inputs from void* to int8_t* in enqueue, maybe more efficient.

This should works! similar in the bert plugin, the mask input in fused mha actually not type kINT32, but we use kINT32 because the unfused version has kINT32 mask.

https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/embLayerNormPlugin/embLayerNormPlugin.cpp#L408
https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/bertQKVToContextPlugin/qkvToContext.cu#L706

@lix19937
Copy link
Author

Thanks. @ttyio Now closed.

@lix19937
Copy link
Author

inut8/uint8 output #3026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants