Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TensorRT timing cache feature #10297

Closed
wants to merge 26 commits into from
Closed

Add TensorRT timing cache feature #10297

wants to merge 26 commits into from

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Jan 15, 2022

TensorRT provides the timing cache feature to reduce the builder time by keeping the layer profiling information during the builder phase. This PR added timing cache feature into ORT-TRT.

Also, please notice that ORT won't use OrtTensorRTProviderOptions struct anymore for TRT EP when adding additional provider option. Instead, it uses the opaque struct OrtTensorRTProviderOptionsV2 as internal struct for setting provider options that can be converted to a string.
Please see #7808 and #10188 for more details and context.

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Jan 15, 2022

Will add unit test for testing this feature.

@jywu-msft
Copy link
Member

jywu-msft commented Jan 19, 2022

Will add unit test for testing this feature.

yes, we definitely need some test cases here.
need to test in conjunction with engine cache enabled/disabled as well.

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Feb 5, 2022

Will add unit test for testing this feature.

yes, we definitely need some test cases here. need to test in conjunction with engine cache enabled/disabled as well.

Test cases for timing cache have been added

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 16, 2022
chilo-ms added a commit that referenced this pull request Mar 10, 2023
### Description

This will enable a user to use a TensorRT timing cache based on #10297
to accelerate build times on a device with the same compute capability.
This will work across models as it simply store kernel runtimes for
specific configurations. Those files are usually very small (only a few
MB) which makes them very easy to ship with an application to accelerate
the build time on the user end.

### Motivation and Context
Especially for workstation use cases TRT build times can be a roadblock.
With a few model from ONNX model zoo i evaluated speedups when a timing
cache is present.
`./build/onnxruntime_perf_test -e tensorrt -I -t 5 -i
"trt_timing_cache_enable|true" <onnx_path>`

|Model | no Cache | with Cache|
| ------------- | ------------- | ------------- |
|efficientnet-lite4-11 | 34.6 s | 7.7 s|
|yolov4 | 108.62 s | 9.4 s|

To capture this is had to modify the onnxruntime_perf_test. The time is
sometimes not captured within "Session creation time cost:" which is why
i introduced "First inference time cost:".

---------

Co-authored-by: Chi Lo <[email protected]>
@gedoensmax
Copy link
Contributor

I think we can close this dur to #14767 right ?

@stale stale bot removed the stale issues that have not been addressed in a while; categorized by a bot label Mar 16, 2023
@chilo-ms
Copy link
Contributor Author

I think we can close this dur to #14767 right ?

yes, we can

@chilo-ms chilo-ms closed this Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants