Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] v1.14.x TensorRT EP is broken on TensorRT sdk 8.4.1.5; uses datatype that does not exist #15118

Closed
diablodale opened this issue Mar 20, 2023 · 10 comments
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform

Comments

@diablodale
Copy link
Contributor

diablodale commented Mar 20, 2023

Describe the issue

Sometime after ORT v1.13.1 the TensorRT EP added code that uses nvinfer1::DataType::kUINT8
That datatype kUINT8 and kFP8 do not exist in TensorRT sdk 8.4.1.5. Therefore the build breaks in many ways in many places.

Onnxruntime build docs write that TensorRT sdk 8.4.1.5 is fully suppported. This is now false.
ORT code changes or forcing newer TensorRT sdk are the choices likely on the table.

Urgency

No response

Target platform

Windows (likely all platforms that support TensorRT)

Build script

.\build.bat --update --build --skip_tests ^
--cmake_generator "Visual Studio 16 2019" --config Release --build_shared_lib --parallel --use_dml --use_cuda --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4" --cuda_version 11.4 --cudnn_home "C:\repos-nobackup\cudnn-windows-x86_64-8.4.1.50_cuda11.6-archive" --use_tensorrt --tensorrt_home "C:\repos-nobackup\TensorRT-8.4.1.5" --cmake_extra_defines CMAKE_INSTALL_PREFIX=C:/repos-nobackup/onnxruntime/.install/Release onnxruntime_USE_AVX=ON

Error / output

too many to list. here are a few...

C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,42): error C2838: 'kUINT8': illegal qualified 
name in member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers 
.cpp) [C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,1): error C2065: 'kUINT8': undeclared identifi
er (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup
\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,1): error C2051: case expression not constant  
(compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup\on 
nxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(24,30): error C2838: 'kUINT8': illegal qualified name in 
 member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [ 
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(24,36): error C2065: 'kUINT8': undeclared identifier (co 
mpiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup\onnxr 
untime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(24,1): error C2051: case expression not constant (compil 
ing source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup\onnxrunti 
me\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(158,30): error C2838: 'kUINT8': illegal qualified name i 
n member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp)  
[C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(158,36): error C2065: 'kUINT8': undeclared identifier (c 
ompiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup\onnx 
runtime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,42): error C2838: 'kUINT8': illegal qualified  
name in member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\onnxErrorRecorder.cp 
p) [C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(158,1): error C2051: case expression not constant (compi 
ling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup\onnxrunt 
ime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,1): error C2065: 'kUINT8': undeclared identifi 
er (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\onnxErrorRecorder.cpp) [C:\repos-nobackup\on 
nxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx-src\onnx\defs\tensor_proto_util.cc(128,1): warning C4244: '*=': conversion from 'in 
t64_t' to 'int', possible loss of data [C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx-build\onnx.vcxproj]
  convert.cc
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx-src\onnx\defs\tensor_proto_util.cc(129,1): warning C4244: '*=': conversion from 'in 
t64_t' to 'int', possible loss of data [C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx-build\onnx.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,42): error C2838: 'kUINT8': illegal qualified  
name in member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\OnnxAttrs.cpp) [C:\r 
epos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,1): error C2051: case expression not constant  
(compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\onnxErrorRecorder.cpp) [C:\repos-nobackup\onnxr 
untime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,42): error C2838: 'kUINT8': illegal qualified  
name in member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\ShapedWeights.cpp) [ 
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\onnx2trt_utils.hpp(75,30): error C2838: 'kUINT8': illegal qualified na 
me in member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.c 
pp) [C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,1): error C2065: 'kUINT8': undeclared identifi 
er (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\OnnxAttrs.cpp) [C:\repos-nobackup\onnxruntim 
e\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\trt_utils.hpp(24,30): error C2838: 'kUINT8': illegal qualified name in 
 member declaration (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\onnxErrorRecorder.cpp) [C:\ 
repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\TensorOrWeights.hpp(115,1): error C2065: 'kUINT8': undeclared identifi 
er (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\ShapedWeights.cpp) [C:\repos-nobackup\onnxru 
ntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]
C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\onnx2trt_utils.hpp(75,36): error C2065: 'kUINT8': undeclared identifie 
r (compiling source file C:\repos-nobackup\onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-src\builtin_op_importers.cpp) [C:\repos-nobackup\ 
onnxruntime\build\Windows\Release\_deps\onnx_tensorrt-build\nvonnxparser_static.vcxproj]

Visual Studio Version

VS2019 v16.11.25

GCC / Compiler Version

No response

@diablodale diablodale added the build build issues; typically submitted using template label Mar 20, 2023
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform labels Mar 20, 2023
@diablodale
Copy link
Contributor Author

more diagnostics...
Today I followed this process...

  1. fetched, fast forwarded merge main.
  2. fetched, and switched to v1.14.1
  3. rm -rf the entire build directory
  4. ran the build.bat command above
  5. waited and the build was broken

If I go to onnxruntime\cmake\external\onnx-tensorrt then...

git status

HEAD detached at 87c7a70
nothing to commit, working tree clean

git log --oneline
87c7a70 (HEAD) TensorRT 8.4.1.5 updates (#849)
e3286f5 TensorRT 8.4-GA ONNX Parser Release
c5cf64e (origin/8.4-EA) missed other sections for supporting >= protobuf 3.11 (#817)

That directory seems to be on the 8.4.1.5 codebase.
However, the files here do not match the _dep files that are in the build directory.

diff TensorOrWeights.hpp ../../../build/Windows/Release/_deps/onnx_tensorrt-src/TensorOrWeights.hpp 
88a89,93
>     bool isFp16() const
>     {
>         return is_tensor() ? _tensor->getType() == nvinfer1::DataType::kHALF
>                     : _weights.type == ::ONNX_NAMESPACE::TensorProto_DataType_FLOAT16;
>     }
109a115
>                 case nvinfer1::DataType::kUINT8: return "UINT8";
117c123
<             switch(_weights.type)
---
>             switch (_weights.type)
119,126c125,133
<                 case ::ONNX_NAMESPACE::TensorProto::DOUBLE: return "DOUBLE -> FLOAT";
<                 case ::ONNX_NAMESPACE::TensorProto::FLOAT: return "FLOAT";
<                 case ::ONNX_NAMESPACE::TensorProto::INT8: return "INT8";
<                 case ::ONNX_NAMESPACE::TensorProto::FLOAT16: return "HALF";
<                 case ::ONNX_NAMESPACE::TensorProto::BOOL: return "BOOL";
<                 case ::ONNX_NAMESPACE::TensorProto::INT32: return "INT32";
<                 case ::ONNX_NAMESPACE::TensorProto::INT64: return "INT64 -> INT32";
<                 default: return "UNKNOWN TYPE";
---
>             case ::ONNX_NAMESPACE::TensorProto::DOUBLE: return "FLOAT";
>             case ::ONNX_NAMESPACE::TensorProto::FLOAT: return "FLOAT";
>             case ::ONNX_NAMESPACE::TensorProto::INT8: return "INT8";
>             case ::ONNX_NAMESPACE::TensorProto::UINT8: return "UINT8";
>             case ::ONNX_NAMESPACE::TensorProto::FLOAT16: return "HALF";
>             case ::ONNX_NAMESPACE::TensorProto::BOOL: return "BOOL";
>             case ::ONNX_NAMESPACE::TensorProto::INT32: return "INT32";
>             case ::ONNX_NAMESPACE::TensorProto::INT64: return "INT32";
>             default: return "UNKNOWN TYPE";

That is unexpected. The build directory was completely deleted.
How are these files not identical? What put incorrect version of files into _dep instead of taking them from cmake\external\onnx-tensorrt?

@skottmckay
Copy link
Contributor

There's no onnx-tensorrt directory under \cmake\external in the latest main.

https://github.com/microsoft/onnxruntime/tree/main/cmake/external

I believe we have been removing things from \cmake\external as submodules are converted to use FetchContent and be fetched/built under _deps in the build output directory. The contents of _deps are not a copy from cmake\external though. Is there a chance any old directory is hanging around in \cmake\external and throwing things off? What does git status report?

@diablodale
Copy link
Contributor Author

diablodale commented Mar 20, 2023

Yes, there are some dirs in cmake/external. I remember seeing a warning during some git work "can't remove xxx dirs because things are in them". This usually isn't a big problem since directories with things that are never used or referenced are just bits on an SSD.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        cmake/external/FP16/
        cmake/external/SafeInt/
        cmake/external/XNNPACK/
        cmake/external/cub/
        cmake/external/cxxopts/
        cmake/external/date/
        cmake/external/dlpack/
        cmake/external/flatbuffers/
        cmake/external/googlebenchmark/
        cmake/external/googletest/
        cmake/external/json/
        cmake/external/mimalloc/
        cmake/external/mp11/
        cmake/external/nsync/
        cmake/external/onnx-tensorrt/
        cmake/external/pthreadpool/
        cmake/external/pytorch_cpuinfo/
        cmake/external/re2/
        cmake/external/tensorboard/
        cmake/external/wil/

Issue 1

The documentation at https://onnxruntime.ai/docs/build/eps.html#prerequisites-1 is now outdated and incorrect.
It seems the newer v1.14+ ORT code ignores many of those dirs.
I suspect the TensorRT version written in docs "v8.4.1.5" is incorrect.
The use of git to remote update and checkout different versions is also incorrect.
etc.

Issue 2

I suspect something hardcoded in v1.14.1 the specific download url and SHA of the onnx-tensorrt backend.

onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/369d6676423c2a6dbf4a5665c4b5010240d99d3c.zip;62119892edfb78689061790140c439b111491275

and also at

"component": {
"type": "git",
"git": {
"commitHash": "369d6676423c2a6dbf4a5665c4b5010240d99d3c",
"repositoryUrl": "https://github.com/onnx/onnx-tensorrt.git"
},
"comments": "onnx_tensorrt"
}

Then something (helpers.ps1 ?) loads that information into cmake vars and then fetch uses it to download and make it available in the build/_deps directory tree.

FetchContent_Declare(
onnx_tensorrt
URL ${DEP_URL_onnx_tensorrt}
URL_HASH SHA1=${DEP_SHA1_onnx_tensorrt}
)

The archive commit above in deps.txt is 369d6676423c2a6dbf4a5665c4b5010240d99d3c which is a commit what is using very new features of TensorRT SDK 8.5 and almost 8.6. It is from Dec 2022 (only 3 months ago)
https://github.com/onnx/onnx-tensorrt/tree/369d6676423c2a6dbf4a5665c4b5010240d99d3c

Contrast that to the needed onnx-tensorrt backend commit for 8.4.15 87c7a70688fd98fb355b8976f41425b40e4fe52f from 14 June 2022 (9 months ago).
https://github.com/onnx/onnx-tensorrt/tree/87c7a70688fd98fb355b8976f41425b40e4fe52f

This makes me think that ORT has significantly changed dependency behavior.

  1. Seems to force a specific version of TensorRT SDK due to a specific version of onnx-tensorrt backend forced.
  2. The ability to change onnx-tensorrt backend versions is no longer available as in the docs.

@diablodale
Copy link
Contributor Author

A workaround may be to use the --use_tensorrt_builtin_parser build param and distribute nvonnxparser.dll from the TensorRT sdk. When I used that param the build was successful. And including that DLL with my app, my app is working. I haven't yet seen a need to distribute nvparsers.dll

@chilo-ms
Copy link
Contributor

chilo-ms commented Mar 21, 2023

Hi,

ORT TRT 1.14 needs TRT 8.5, that's why you encountered build error.
Thanks for pointing out the build doc is misleading, and we will modify the doc.

As Scott mentioned, onnx-tensorrt directory was removed from cmake\external started from ORT 1.14 and if you want to manually change to specific onnx-tensorrt commit, please either modify cmake\deps.txt or directly modify the onnx-tensorrt under _deps in the build output directory.

As you might notice you can use --use_tensorrt_builtin_parser, in the next ORT 1.15 release, we will use built-in parser by default which comes with TRT libraries.

@diablodale
Copy link
Contributor Author

Thanks for confirming my research above.
When TensorRT EP docs are updated, reference this issue and I can confirm the changes so to close this issue.

I recommend quick reviewing other parts of doc that might be outdated due to this changed dependency management (perhaps one of the 20 dirs list4ed above no longer under cmake/external/)

@yf711
Copy link
Contributor

yf711 commented Jun 28, 2023

@diablodale
Copy link
Contributor Author

I looked at the updates. I see unclear changes. I've reviewed with comments at #16465

@yf711
Copy link
Contributor

yf711 commented Jun 29, 2023

I looked at the updates. I see unclear changes. I've reviewed with comments at #16465

Thanks for your comments! I've initiated another PR to apply on these comments: #16537

yf711 added a commit that referenced this issue Jul 7, 2023
### Description
* Clarify TRTEP version info


### Motivation and Context
#15118: issue to request
#16465: previous PR to
update trtep doc
@diablodale
Copy link
Contributor Author

I can see the updated live at https://onnxruntime.ai/docs/build/eps.html
closing as fixed by doc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

4 participants