-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support UINT8 input / output and casting from UINT8 to FP16 and back #3026
Comments
The I've tried with |
I think it's just that Nvidia didn't update trtexec after updating tensorrt. Even the documentation of trtexec does not include uint8 among the valid input / output types. |
@zerollzeng could we make sure the trtexec CLI is updated and conforms to the supported types in TensorRT ? |
Let's me check this internally and see if we can improve this :-) |
Thank you ! Maybe we should label this issue as a bug ? The CLI is clearly not behaving as it should, right ? |
Filed internal bug 4146128 to support uint8 in trtexec. To WAR this you can try polygraphy, we should support it in polygraphy tools. |
Thank you, to be more precise, you can use uint8 with trtexec if the onnx graph as uint8 inputs and or outputs but adding the following arguments in trtexec: |
we added support for I'm closing this now, feel free to reopen if you have any further questions. |
Is this shipped yet? |
Yes, it should be in the latest 8.6 or 9.0 release. |
@zerollzeng Thanks |
@zerollzeng How to do this in Python API? |
You can also take the polygraphy source code as reference as it's open-source. |
Good job to the developers! |
Hello,
Thanks for the great work,
I am currently having trouble serving TensorRT models that take very large image inputs and return very large image outputs in Triton Inference Server as I need to send them as Float16.
Ideally I would like to send them as UINT8 as that is what my image decoder outputs.
The only way around I have find is to have an INT8 input / output and do the conversion to Float16 inside the model.
To account for the UINT8 -> INT8 signed bit I need to run:
and for the output:
But this is not compatible with TensorRT due to an error in ScatterND.
We've found a way around this by implementing a plugin in CUDA that does these transformations for us, however we think it would make sense supporting UINT8 inputs and outputs as that is what image decoders outputs usually.
It would also make sense supporting casting UINT8 to FP16 and FP16 to UINT8.
On a side note, it would be great to be able to use int8 only for inputs / outputs without expanding the set of available tactics to int8 based tactics.
The text was updated successfully, but these errors were encountered: