-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT EP - timing cache #14767
TensorRT EP - timing cache #14767
Conversation
…vider options struct
# Conflicts: # include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h # include/onnxruntime/core/session/onnxruntime_cxx_api.h # onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc # onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h # onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_info.cc # onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_info.h # onnxruntime/core/providers/tensorrt/tensorrt_provider_factory.cc # onnxruntime/core/session/provider_bridge_ort.cc # onnxruntime/python/onnxruntime_pybind_state.cc # onnxruntime/test/perftest/ort_test_session.cc # onnxruntime/test/providers/cpu/model_tests.cc # onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 6 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
The CIs failed. |
@chilo-ms could you run the pipeline again ? |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 6 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 6 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
/azp run Linux QNN CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
### Description Patch #14767 in order to make two provider options `force_timing_cache` and `detailed_build_log` can be updated. Otherwise, they only use default value. `timing_cache_enable` is good.
Description
This will enable a user to use a TensorRT timing cache based on #10297 to accelerate build times on a device with the same compute capability. This will work across models as it simply store kernel runtimes for specific configurations. Those files are usually very small (only a few MB) which makes them very easy to ship with an application to accelerate the build time on the user end.
Motivation and Context
Especially for workstation use cases TRT build times can be a roadblock. With a few model from ONNX model zoo i evaluated speedups when a timing cache is present.
./build/onnxruntime_perf_test -e tensorrt -I -t 5 -i "trt_timing_cache_enable|true" <onnx_path>
To capture this is had to modify the onnxruntime_perf_test. The time is sometimes not captured within "Session creation time cost:" which is why i introduced "First inference time cost:".