-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TensorRT timing cache feature #10297
Conversation
…vider options struct
Will add unit test for testing this feature. |
yes, we definitely need some test cases here. |
Test cases for timing cache have been added |
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
### Description This will enable a user to use a TensorRT timing cache based on #10297 to accelerate build times on a device with the same compute capability. This will work across models as it simply store kernel runtimes for specific configurations. Those files are usually very small (only a few MB) which makes them very easy to ship with an application to accelerate the build time on the user end. ### Motivation and Context Especially for workstation use cases TRT build times can be a roadblock. With a few model from ONNX model zoo i evaluated speedups when a timing cache is present. `./build/onnxruntime_perf_test -e tensorrt -I -t 5 -i "trt_timing_cache_enable|true" <onnx_path>` |Model | no Cache | with Cache| | ------------- | ------------- | ------------- | |efficientnet-lite4-11 | 34.6 s | 7.7 s| |yolov4 | 108.62 s | 9.4 s| To capture this is had to modify the onnxruntime_perf_test. The time is sometimes not captured within "Session creation time cost:" which is why i introduced "First inference time cost:". --------- Co-authored-by: Chi Lo <[email protected]>
I think we can close this dur to #14767 right ? |
yes, we can |
TensorRT provides the timing cache feature to reduce the builder time by keeping the layer profiling information during the builder phase. This PR added timing cache feature into ORT-TRT.
Also, please notice that ORT won't use
OrtTensorRTProviderOptions
struct anymore for TRT EP when adding additional provider option. Instead, it uses the opaque structOrtTensorRTProviderOptionsV2
as internal struct for setting provider options that can be converted to a string.Please see #7808 and #10188 for more details and context.