-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int8/uint8/bool as input not support in trt plugin ? #3959
Comments
loop @brb-nv |
@lix19937 , TRT only have quantized INT8, no vanilla INT8 today. And you are right, only quantized int8 is supported in plugin. bool/uint8/vanilla int8 are not supported in plugin today. For your case, besides call
No calibration needed for both workarounds. |
@lix19937 , since you have pattern:
And the plugin implementation is blackbox to trt, So we could also workaround this by pack your INT8/UINT8 input as kINT32, feed TRT as kINT32 input, and inside your plugin implementation, you can read it as INT8/UINT8. |
Thanks very much @ttyio
This need add one mul layer before plugin
Just CAST |
Similar case #1792 (comment) |
This should works! similar in the bert plugin, the mask input in fused mha actually not type kINT32, but we use kINT32 because the unfused version has kINT32 mask. https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/embLayerNormPlugin/embLayerNormPlugin.cpp#L408 |
Thanks. @ttyio Now closed. |
inut8/uint8 output #3026 |
Description
If one of model inputs is int8(like index, not by calib), other's inputs are float or int32, when I use
error from trtexec
Also, I found that as long as there is an int8 type in the input, calibration will be triggered, user need override scale with scale and zero-point. But in reality case, there are some inputs that are originally int8 and do not require quantization.
Here I use int8 type as index data type, just for vectorized access. Improve bandwidth utilization and reduce the number of thread creations. So int8/uint8/bool as input not support in trt plugin ? If not support, I think it not very reasonable.
From https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin_v2.html#af502120d140358f8d8139ade5005b9f5
@ttyio @zerollzeng
Environment
TensorRT Version:8611
NVIDIA GPU:Orin-X
NVIDIA Driver Version:
CUDA Version:11.4
CUDNN Version:11.6
Operating System:ubbuntu2004
PyTorch Version (if applicable):1.13
The text was updated successfully, but these errors were encountered: