-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670
Conversation
Our macOS pipeline are failing because of a build error in absl. However, the bug fix we need is not available in the latest ABSL release. Here is the issue: abseil/abseil-cpp#1536 And here is the fix: abseil/abseil-cpp@779a356 GTests uses ABSL. But this ABSL target also depends on GTest. So, it is a circular dependency. We should be able to avoid that by avoid building tests for ABSL. However, the version we are using has a problem with that: it has cmake target that still depends on GTest even when testing is disabled. It's strange that we suddenly hit this problem and it only happens on macOS.
### Description This extends the existing pad_fusion for AveragePool operator i.e. fuse Pad if it is followed by AveragePool operator. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
(1) Run onnxruntime/cgmanifests/generate_cgmanifest.py |
### Description This change enhances the existing Pad Fusion to fuse Pad even if a Cast operator is present between Pad and Conv/MaxPool/AveragePool. It keeps the Cast as it is. <pre> /* * Before Fusion: * Pad * | * Cast (Optional) * | * Conv/MaxPool/AveragePool * * After Fusion: * Cast (Optional) * | * Conv/MaxPool/AveragePool */ </pre> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
* Fix migraphx build error caused by #21598: Add a conditional compile on code block that depends on ROCm >= 6.2. Note that the pipeline uses ROCm 6.0. Unblock orttraining-linux-gpu-ci-pipeline and orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline pipelines: * Disable a model test in linux GPU training ci pipelines caused by #19470: Sometime, cudnn frontend throws exception that cudnn graph does not support a Conv node of keras_lotus_resnet3D model on V100 GPU. Note that same test does not throw exception in other GPU pipelines. The failure might be related to cudnn 8.9 and V100 GPU used in the pipeline (Amper GPUs and cuDNN 9.x do not have the issue). The actual fix requires fallback logic, which will take time to implement, so we temporarily disable the test in training pipelines. * Force install torch for cuda 11.8. (The docker has torch 2.4.0 for cuda 12.1 to build torch extension, which it is not compatible cuda 11.8). Note that this is temporary walkround. More elegant fix is to make sure right torch version in docker build step, that might need update install_python_deps.sh and corresponding requirements.txt. * Skip test_gradient_correctness_conv1d since it causes segment fault. Root cause need more investigation (maybe due to cudnn frontend as well). * Skip test_aten_attention since it causes assert failure. Root cause need more investigation (maybe due to torch version). * Skip orttraining_ortmodule_distributed_tests.py since it has error that compiler for torch extension does not support c++17. One possible fix it to set the following compile argument inside setup.py of extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17']. However, due to the urgency of unblocking the pipelines, just disable the test for now. * skip test_softmax_bf16_large. For some reason, torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so the test was run in CI, but V100 does not support bf16 natively. * Fix typo of deterministic <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
0ef68f2
to
b8670e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pad changes look fine to me. Deferring to others for the non-pad changes (manifest, YAML, QNN).
…soft/onnxruntime into user/sumita/cherrypick-pad
Description
This change cherry-picks 2 Pad fusion optimization: #21640 and #21556.
It also has to cherry-pick 2 extra changes to unblock pipeline and dependency failure: #21300 and #21662 (didn't include test which are part of 1.18.1 payload).
Also uploaded new version of onnxruntime_build_dependencies:10.177 and updated the same in
download-deps.yml
.Additionally it also updates DML binary to 1.15.1.
Motivation and Context