-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The way to interface the INT8 input #1792
Comments
You should feed the model with the fp32 data as usual. The TensorRT will convert the input Tensor to int8. |
@spivakoa I've searched the examples including sampleINT8 things or sampleUffPluginV2, but every examples copy or deliver the data in FP32 formats, even the model outputs INT8 output in sampleUffPluginV2 example. In the sampleUffPluginV2 example, model output the INT8 outputs FP32 data, and copy the data (Device to Host) every 4 bytes per outputs and cast its type into INT8. My main problem is the bandwith, because the handling data is huge. Is there any way to do it? |
@seongkyun The memcpy process is not related to neural networks and it always copies chunks of 32 bit data. Even if the model is quantized to 8 bit, this means that it has scales to convert data from 32 to 8 bit inference. The conversion is done by the model itself, the data is still read from the GPU global memory with accesses of 32 bit. The output of the model is 32 bit too as it uses a pre-calculated scale to convert data back to 32 bit form. |
We have sample https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleIOFormats demo how to control network in/out IO, hope it helps! |
Closing since no activity for more than 3 weeks, please reopen if you still have issue, thanks! |
@seongkyun |
@Dxye let's ignore the formula in the sample, even the dynamic range is faked to -127~127 in the sample, instead of calibration from real data. |
I also met the bandwidth problem, do you have a method to solve this? |
Sometimes, one input of net ("x") is int8 or uint8 (to save index, not need calib ), and others are fp32, when I use |
1 similar comment
@lix19937 Sorry for the delay response, yes you are right, we only have quantized int8 in trt today. We require you to add --int8 flag when you used int8 datatype, and when TRT cannot tell the quantization scale, we will ask you to do the calibration. |
Thanks. @ttyio |
I am trying to feed the INT8 inputs.
The every examples on the google uses the float(fp32) inputs, not the int8 one.
I have ONNX model which is trained on float(fp32) precision data, and calibration DB.
I've calibrated the onnx model using onnx parser and Int8EntropyCalibrator2 and the code ran successfully. (TensorRT C++ api)
But I have to still feed the fp32 input and get fp32 outputs.
Is there any way to feed the int8 input to the tensorrt model?
(meaning that when I run the "context->enqueue" code, the every inputs in the input buffer are the signed char, in the range of [-128, 127].)
also, I've tried to add the "network->addInput("input_node_name", DataType::kINT8, nvinfer::Dims4{N, C, H, W})" to the code, but it prints out the error "[TRT] [network.cpp::addInput::1507] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addInput::1507, condition: inName != knownInput->getName()".
I think the onnx parser doesn't support the addInput and only caffe/uff parser or from scratch method can support it.
Please help me guys :)
The text was updated successfully, but these errors were encountered: