The way to interface the INT8 input #1792

seongkyun · 2022-02-10T07:32:46Z

I am trying to feed the INT8 inputs.

The every examples on the google uses the float(fp32) inputs, not the int8 one.

I have ONNX model which is trained on float(fp32) precision data, and calibration DB.
I've calibrated the onnx model using onnx parser and Int8EntropyCalibrator2 and the code ran successfully. (TensorRT C++ api)
But I have to still feed the fp32 input and get fp32 outputs.

Is there any way to feed the int8 input to the tensorrt model?
(meaning that when I run the "context->enqueue" code, the every inputs in the input buffer are the signed char, in the range of [-128, 127].)

also, I've tried to add the "network->addInput("input_node_name", DataType::kINT8, nvinfer::Dims4{N, C, H, W})" to the code, but it prints out the error "[TRT] [network.cpp::addInput::1507] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addInput::1507, condition: inName != knownInput->getName()".

I think the onnx parser doesn't support the addInput and only caffe/uff parser or from scratch method can support it.

Please help me guys :)

spivakoa · 2022-02-13T19:30:20Z

You should feed the model with the fp32 data as usual. The TensorRT will convert the input Tensor to int8.

seongkyun · 2022-02-14T02:39:32Z

@spivakoa
Is there any way to feed int8 input?
The thing is that, memcpy process.
memcpy process (copy the data from Host to Device or Device to Host) always copies the data in FP32 format, even the model is quantized into INT8 one.

I've searched the examples including sampleINT8 things or sampleUffPluginV2, but every examples copy or deliver the data in FP32 formats, even the model outputs INT8 output in sampleUffPluginV2 example.

In the sampleUffPluginV2 example, model output the INT8 outputs FP32 data, and copy the data (Device to Host) every 4 bytes per outputs and cast its type into INT8.

My main problem is the bandwith, because the handling data is huge.
(If I can use the INT8 interface, then the I/O bandwithe will be reduced into 4Byte -> 1Byte per each pixels.)

Is there any way to do it?

spivakoa · 2022-02-15T13:05:29Z

@seongkyun The memcpy process is not related to neural networks and it always copies chunks of 32 bit data. Even if the model is quantized to 8 bit, this means that it has scales to convert data from 32 to 8 bit inference. The conversion is done by the model itself, the data is still read from the GPU global memory with accesses of 32 bit. The output of the model is 32 bit too as it uses a pre-calculated scale to convert data back to 32 bit form.

ttyio · 2022-03-15T09:38:09Z

We have sample https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleIOFormats demo how to control network in/out IO, hope it helps!
You can also experiments with trtexec using options like --inputIOFormats=int8:cdhw32 --outputIOFormats=int8:cdhw32, thanks!

ttyio · 2022-04-26T00:14:17Z

Closing since no activity for more than 3 weeks, please reopen if you still have issue, thanks!

Dxye · 2023-07-25T04:36:13Z

@seongkyun
Have you figured out how to feed int8 input to the engine converted with IOInputFormats=int8:chw?
I nomralized uint8 image to [-1, 1] and feed it to fp32 input engine and got correct result.
I have tried convert uint8 image to int8_t [-128, 127] and got wrong result.
@ttyio
https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleIOFormats convert fp32 to int8_t as follows:
tmp[i] = static_cast(1 - ((1.0F - golden[i]) * 255.0F - 128) / 255.0F);
I tried but still cannot get correct result. I cannot understand the formula neither.

ttyio · 2023-07-25T17:14:56Z

@Dxye let's ignore the formula in the sample, even the dynamic range is faked to -127~127 in the sample, instead of calibration from real data.
For your case, have you call setDynamicRange(-1, 1) for your network input before build engine? thanks

jax11235 · 2024-03-10T02:20:39Z

@spivakoa Is there any way to feed int8 input? The thing is that, memcpy process. memcpy process (copy the data from Host to Device or Device to Host) always copies the data in FP32 format, even the model is quantized into INT8 one.

I've searched the examples including sampleINT8 things or sampleUffPluginV2, but every examples copy or deliver the data in FP32 formats, even the model outputs INT8 output in sampleUffPluginV2 example.

In the sampleUffPluginV2 example, model output the INT8 outputs FP32 data, and copy the data (Device to Host) every 4 bytes per outputs and cast its type into INT8.

My main problem is the bandwith, because the handling data is huge. (If I can use the INT8 interface, then the I/O bandwithe will be reduced into 4Byte -> 1Byte per each pixels.)

Is there any way to do it?

I also met the bandwidth problem, do you have a method to solve this?

lix19937 · 2024-06-21T10:08:54Z

Sometimes, one input of net ("x") is int8 or uint8 (to save index, not need calib ), and others are fp32, when I use trtexec --fp16 , will build error.

lix19937 · 2024-06-21T10:13:41Z

@ttyio

lix19937 · 2024-06-21T10:13:46Z

@ttyio

ttyio · 2024-06-25T00:00:49Z

Sometimes, one input of net ("x") is int8 or uint8 (to save index, not need calib ), and others are fp32, when I use trtexec --fp16 , will build error.

@lix19937 Sorry for the delay response, yes you are right, we only have quantized int8 in trt today. We require you to add --int8 flag when you used int8 datatype, and when TRT cannot tell the quantization scale, we will ask you to do the calibration.

lix19937 · 2024-06-25T10:02:42Z

Thanks. @ttyio

ttyio added question Further information is requested Precision: INT8 triaged Issue has been triaged by maintainers labels Mar 15, 2022

ttyio closed this as completed Apr 26, 2022

lix19937 mentioned this issue Jun 25, 2024

int8/uint8/bool as input not support in trt plugin ? #3959

Closed

Y-T-G mentioned this issue Nov 20, 2024

yolov11 export engine fp16 bug ultralytics/ultralytics#17654

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The way to interface the INT8 input #1792

The way to interface the INT8 input #1792

seongkyun commented Feb 10, 2022 •

edited

Loading

spivakoa commented Feb 13, 2022

seongkyun commented Feb 14, 2022 •

edited

Loading

spivakoa commented Feb 15, 2022

ttyio commented Mar 15, 2022

ttyio commented Apr 26, 2022

Dxye commented Jul 25, 2023

ttyio commented Jul 25, 2023

jax11235 commented Mar 10, 2024

lix19937 commented Jun 21, 2024

lix19937 commented Jun 21, 2024

lix19937 commented Jun 21, 2024

ttyio commented Jun 25, 2024

lix19937 commented Jun 25, 2024

The way to interface the INT8 input #1792

The way to interface the INT8 input #1792

Comments

seongkyun commented Feb 10, 2022 • edited Loading

spivakoa commented Feb 13, 2022

seongkyun commented Feb 14, 2022 • edited Loading

spivakoa commented Feb 15, 2022

ttyio commented Mar 15, 2022

ttyio commented Apr 26, 2022

Dxye commented Jul 25, 2023

ttyio commented Jul 25, 2023

jax11235 commented Mar 10, 2024

lix19937 commented Jun 21, 2024

lix19937 commented Jun 21, 2024

lix19937 commented Jun 21, 2024

ttyio commented Jun 25, 2024

lix19937 commented Jun 25, 2024

seongkyun commented Feb 10, 2022 •

edited

Loading

seongkyun commented Feb 14, 2022 •

edited

Loading