INT8 Precision for higher FPS #55

Ironclad17 · 2023-08-30T16:01:36Z

Ironclad17
Aug 30, 2023

Considering the fps gains fp16 precision inference provides: https://github.com/AmusementClub/vs-mlrt/wiki/RealESRGANv2
Has anyone attempted int8 precision? The parameter appears to be accessible: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-flags
It should also be accessible in ort: https://cloudblogs.microsoft.com/opensource/2022/05/02/optimizing-and-deploying-transformer-int8-inference-with-onnx-runtime-tensorrt-on-nvidia-gpus/

Apologies, I'm out of my depth. I have no idea what level of precision is necessary for a useful result, but a lot of research seems to be focusing on lower precision inference and training for higher throughput.
https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
https://developer.nvidia.com/blog/int4-for-ai-inference/
https://youtu.be/hCxvS1dVufs?si=hqpzognE2-hCT8Gz&t=224

WolframRhodium · 2023-08-30T17:07:24Z

WolframRhodium
Aug 30, 2023
Maintainer

Hi, int8 is a hidden advanced feature that may provide additional acceleration .

However, applying int8 inference acceleration for neural networks is not an easy task, which usually requires sophisticated calibration to reduce the loss of accuracy, and this problem is much more severe for image processing tasks.

Given that the feature cannot be used in an out-of-the-box manner, this feature requires advanced users to manually follow guides elsewhere.

1 reply

Ironclad17 Aug 31, 2023
Author

From the documentation above it looks like the quantization from fp32 to int8 is done with a calibration table and the best way to get that is to use the cache made during training which practically means to use int precision you need to train new models.
https://huggingface.co/docs/optimum/concept_guides/quantization
https://github.com/rmccorm4/tensorrt-utils/tree/master/int8/calibration

Also it may actually be slower for smaller models unless you free up enough vram for more streams.
https://huggingface.co/blog/hf-bitsandbytes-integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT8 Precision for higher FPS #55

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

INT8 Precision for higher FPS #55

Ironclad17 Aug 30, 2023

Replies: 1 comment · 1 reply

WolframRhodium Aug 30, 2023 Maintainer

Ironclad17 Aug 31, 2023 Author

Ironclad17
Aug 30, 2023

Replies: 1 comment 1 reply

WolframRhodium
Aug 30, 2023
Maintainer

Ironclad17 Aug 31, 2023
Author