Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The way to interface the INT8 input #1792

Closed
seongkyun opened this issue Feb 10, 2022 · 13 comments
Closed

The way to interface the INT8 input #1792

seongkyun opened this issue Feb 10, 2022 · 13 comments
Labels
question Further information is requested triaged Issue has been triaged by maintainers

Comments

@seongkyun
Copy link

seongkyun commented Feb 10, 2022

I am trying to feed the INT8 inputs.

The every examples on the google uses the float(fp32) inputs, not the int8 one.

I have ONNX model which is trained on float(fp32) precision data, and calibration DB.
I've calibrated the onnx model using onnx parser and Int8EntropyCalibrator2 and the code ran successfully. (TensorRT C++ api)
But I have to still feed the fp32 input and get fp32 outputs.

Is there any way to feed the int8 input to the tensorrt model?
(meaning that when I run the "context->enqueue" code, the every inputs in the input buffer are the signed char, in the range of [-128, 127].)

also, I've tried to add the "network->addInput("input_node_name", DataType::kINT8, nvinfer::Dims4{N, C, H, W})" to the code, but it prints out the error "[TRT] [network.cpp::addInput::1507] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addInput::1507, condition: inName != knownInput->getName()".

I think the onnx parser doesn't support the addInput and only caffe/uff parser or from scratch method can support it.

Please help me guys :)

@spivakoa
Copy link

You should feed the model with the fp32 data as usual. The TensorRT will convert the input Tensor to int8.

@seongkyun
Copy link
Author

seongkyun commented Feb 14, 2022

@spivakoa
Is there any way to feed int8 input?
The thing is that, memcpy process.
memcpy process (copy the data from Host to Device or Device to Host) always copies the data in FP32 format, even the model is quantized into INT8 one.

I've searched the examples including sampleINT8 things or sampleUffPluginV2, but every examples copy or deliver the data in FP32 formats, even the model outputs INT8 output in sampleUffPluginV2 example.

In the sampleUffPluginV2 example, model output the INT8 outputs FP32 data, and copy the data (Device to Host) every 4 bytes per outputs and cast its type into INT8.

My main problem is the bandwith, because the handling data is huge.
(If I can use the INT8 interface, then the I/O bandwithe will be reduced into 4Byte -> 1Byte per each pixels.)

Is there any way to do it?

@spivakoa
Copy link

@seongkyun The memcpy process is not related to neural networks and it always copies chunks of 32 bit data. Even if the model is quantized to 8 bit, this means that it has scales to convert data from 32 to 8 bit inference. The conversion is done by the model itself, the data is still read from the GPU global memory with accesses of 32 bit. The output of the model is 32 bit too as it uses a pre-calculated scale to convert data back to 32 bit form.

@ttyio
Copy link
Collaborator

ttyio commented Mar 15, 2022

We have sample https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleIOFormats demo how to control network in/out IO, hope it helps!
You can also experiments with trtexec using options like --inputIOFormats=int8:cdhw32 --outputIOFormats=int8:cdhw32, thanks!

@ttyio ttyio added question Further information is requested Precision: INT8 triaged Issue has been triaged by maintainers labels Mar 15, 2022
@ttyio
Copy link
Collaborator

ttyio commented Apr 26, 2022

Closing since no activity for more than 3 weeks, please reopen if you still have issue, thanks!

@ttyio ttyio closed this as completed Apr 26, 2022
@Dxye
Copy link

Dxye commented Jul 25, 2023

@seongkyun
Have you figured out how to feed int8 input to the engine converted with IOInputFormats=int8:chw?
I nomralized uint8 image to [-1, 1] and feed it to fp32 input engine and got correct result.
I have tried convert uint8 image to int8_t [-128, 127] and got wrong result.
@ttyio
https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleIOFormats convert fp32 to int8_t as follows:
tmp[i] = static_cast(1 - ((1.0F - golden[i]) * 255.0F - 128) / 255.0F);
I tried but still cannot get correct result. I cannot understand the formula neither.

@ttyio
Copy link
Collaborator

ttyio commented Jul 25, 2023

@Dxye let's ignore the formula in the sample, even the dynamic range is faked to -127~127 in the sample, instead of calibration from real data.
For your case, have you call setDynamicRange(-1, 1) for your network input before build engine? thanks

@jax11235
Copy link

@spivakoa Is there any way to feed int8 input? The thing is that, memcpy process. memcpy process (copy the data from Host to Device or Device to Host) always copies the data in FP32 format, even the model is quantized into INT8 one.

I've searched the examples including sampleINT8 things or sampleUffPluginV2, but every examples copy or deliver the data in FP32 formats, even the model outputs INT8 output in sampleUffPluginV2 example.

In the sampleUffPluginV2 example, model output the INT8 outputs FP32 data, and copy the data (Device to Host) every 4 bytes per outputs and cast its type into INT8.

My main problem is the bandwith, because the handling data is huge. (If I can use the INT8 interface, then the I/O bandwithe will be reduced into 4Byte -> 1Byte per each pixels.)

Is there any way to do it?

I also met the bandwidth problem, do you have a method to solve this?

@lix19937
Copy link

Sometimes, one input of net ("x") is int8 or uint8 (to save index, not need calib ), and others are fp32, when I use trtexec --fp16 , will build error.

@lix19937
Copy link

@ttyio

1 similar comment
@lix19937
Copy link

@ttyio

@ttyio
Copy link
Collaborator

ttyio commented Jun 25, 2024

Sometimes, one input of net ("x") is int8 or uint8 (to save index, not need calib ), and others are fp32, when I use trtexec --fp16 , will build error.

@lix19937 Sorry for the delay response, yes you are right, we only have quantized int8 in trt today. We require you to add --int8 flag when you used int8 datatype, and when TRT cannot tell the quantization scale, we will ask you to do the calibration.

@lix19937
Copy link

Thanks. @ttyio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants