-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable support of multi-level nested control flow ops model for TRT EP #12147
Conversation
const std::vector<NodeIndex>& node_index = graph.GetNodesInTopologicalOrder(); | ||
|
||
// We currently exclude "If" and "Loop" control flow ops from original node vector before calling TensorRT parser. | ||
// The reason is, these control flow ops have subgraph which might contain TRT fused node after ORT partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about cases where the subgraph don't contain fused nodes?
i.e. presumably TRT can handle some loop/if cases (which don't have multi-level nesting?) we may lose perf in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle)
shouldn't we raise the root issue to nvidia?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about cases where the subgraph don't contain fused nodes? i.e. presumably TRT can handle some loop/if cases (which don't have multi-level nesting?) we may lose perf in those cases?
In that case, TRT can handle the loop/if ops as well as their subgraphs. Yes, TRT might have better perf.
But, due to the bottom-up approach of graph partitioning in ORT. ORT will first fuse the nodes in the subgraph into one "TRT fused" node if the nodes are supported and remove the original nodes. At this point, it's hard for TRT EP to tell ORT we don't want to fuse the nodes, especially if there are multiple levels of nested control flop ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle) shouldn't we raise the root issue to nvidia?
It's because the fused node is created by ORT and it's not the standard ONNX node, so TRT parser doesn't recognize. I think this is not an issue from Nvidia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle) shouldn't we raise the root issue to nvidia?
It's because the fused node is created by ORT and it's not the standard ONNX node, so TRT parser doesn't recognize. I think this is not an issue from Nvidia.
i think i understand now. you're saying the parser doesn't recognize any ops not in official onnx namespace? but i thought we have been able to support MS domain ops and other custom cuda ops along with TRT EP. let's discuss more offline.
// If this is the case, TensorRT parser will complain the non-recognized TRT fused node and fail. | ||
for (const auto& index : nodes_vector) { | ||
const auto& node = graph.GetNode(node_index[index]); | ||
if (node->OpType() == "If" || node->OpType() == "Loop" || node->OpType() == "Scan") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how long the real fix will be worked out. But it seems ruling out some ops based on OpType is a nice option to have. Can we generalize the case by getting OpTypes from provider options?
#12147) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op
* update package version * Prevent unbounded growth of command allocator memory (#12114) Prevent unbounded growth of command allocator memory * Update supported ops md for NNAPI/CoreML EP (#12245) * update supported ops md * address pr comments * address pr comments * wording * Change native folder name for java macos arm64 (#12335) * Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280) Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](caolan/async@v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [js/rn] upgrade dependencies for e2e test (#11863) * [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable * [js/rn] upgrade package react-native@^0.69.1 (#12155) * [js/rn] upgrade package react-native@^0.69.1 * upgrade compile sdk to v31 * update ios version requirement * update pod path for onnxruntime-react-native * add missing build_java in Android testing stage. (#12187) add missing build_java in testing * Use specific Android NDK version in CI builds. (#12350) Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control. This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it. * Remove preview keyword from DirectML pacakge (#12368) Remove preview keyword Co-authored-by: Sumit Agarwal <[email protected]> * Scope CreateFileMapping2 to valid API partitions (#12374) * Fix TRT custom op issue (#12283) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue. * Manually add optimization flag for Android Release builds. (#12390) With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration. More details here: android/ndk#1740 Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21. This change is a workaround to manually add `-O3` for "Release" Android builds. * resolve conflicts in tensorRT related changes * Enable support of multi-level nested control flow ops model for TRT EP (#12147) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op Co-authored-by: Jeff Bloomfield <[email protected]> Co-authored-by: Rachel Guo <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Yi Zhang <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: sumitsays <[email protected]> Co-authored-by: Sumit Agarwal <[email protected]> Co-authored-by: Justin Stoecker <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Chi Lo <[email protected]>
One of the reasons that TRT EP can't run multi-level nested control flow ops model is the subgraph of control flow op might contain fused TRT node after ORT partition. If this situation happens, TRT parser will complain about the non-recognized fused TRT node and then fail. Here we exclude those control flow ops before calling the TRT parser.
Also, outer scope values need to be handled in order to run the multi-level nested control flow ops model.
Note: This is a workaround version, will have a real fix in other PR.