Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670

Merged
merged 21 commits into from
Aug 12, 2024

Conversation

sumitsays
Copy link
Contributor

@sumitsays sumitsays commented Aug 8, 2024

Description

This change cherry-picks 2 Pad fusion optimization: #21640 and #21556.

It also has to cherry-pick 2 extra changes to unblock pipeline and dependency failure: #21300 and #21662 (didn't include test which are part of 1.18.1 payload).

Also uploaded new version of onnxruntime_build_dependencies:10.177 and updated the same in download-deps.yml.

Additionally it also updates DML binary to 1.15.1.

Motivation and Context

snnn and others added 11 commits August 7, 2024 21:07
Our macOS pipeline are failing because of a build error in absl.
However, the bug fix we need is not available in the latest ABSL
release.

Here  is the issue: abseil/abseil-cpp#1536
And here is the fix:
abseil/abseil-cpp@779a356

GTests uses ABSL. But this ABSL target also depends on GTest. So, it is
a circular dependency. We should be able to avoid that by avoid building
tests for ABSL. However, the version we are using has a problem with
that: it has cmake target that still depends on GTest even when testing
is disabled.

It's strange that we suddenly hit this problem and it only happens on macOS.
### Description
This extends the existing pad_fusion for AveragePool operator i.e. fuse
Pad if it is followed by AveragePool operator.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@sumitsays sumitsays requested review from a team as code owners August 8, 2024 04:11
@snnn
Copy link
Member

snnn commented Aug 8, 2024

(1) Run onnxruntime/cgmanifests/generate_cgmanifest.py
(2) Find out what is latest version: https://dev.azure.com/onnxruntime/onnxruntime/_artifacts/feed/onnxruntime/UPack/onnxruntime_build_dependencies/overview/1.0.176 and increase the version number to use in next step.
(3) Follow instruction in cmake/deps_update_and_upload.py to make changes to cmake/deps.txt and run the script to download dependencies and upload to azure.

@snnn snnn closed this Aug 8, 2024
@snnn snnn reopened this Aug 8, 2024
@sumitsays sumitsays changed the title Expand Pad fusion [ORT 1.18.2] Cherry Pick Pad Optimizations Aug 9, 2024
sumitsays and others added 3 commits August 9, 2024 07:05
### Description
This change enhances the existing Pad Fusion to fuse Pad even if a Cast
operator is present between Pad and Conv/MaxPool/AveragePool. It keeps
the Cast as it is.
<pre>
/*
 * Before Fusion:
 *     Pad
 *      |
 *    Cast (Optional)
 *      |
 *   Conv/MaxPool/AveragePool
 * 
 * After Fusion:
 *    Cast (Optional)
 *      |
 *   Conv/MaxPool/AveragePool
 */
</pre>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
* Fix migraphx build error caused by
#21598:
Add a conditional compile on code block that depends on ROCm >= 6.2.
Note that the pipeline uses ROCm 6.0.

Unblock orttraining-linux-gpu-ci-pipeline and
orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline
pipelines:
* Disable a model test in linux GPU training ci pipelines caused by
#19470:
Sometime, cudnn frontend throws exception that cudnn graph does not
support a Conv node of keras_lotus_resnet3D model on V100 GPU.
Note that same test does not throw exception in other GPU pipelines. The
failure might be related to cudnn 8.9 and V100 GPU used in the pipeline
(Amper GPUs and cuDNN 9.x do not have the issue).
The actual fix requires fallback logic, which will take time to
implement, so we temporarily disable the test in training pipelines.
* Force install torch for cuda 11.8. (The docker has torch 2.4.0 for
cuda 12.1 to build torch extension, which it is not compatible cuda
11.8). Note that this is temporary walkround. More elegant fix is to
make sure right torch version in docker build step, that might need
update install_python_deps.sh and corresponding requirements.txt.
* Skip test_gradient_correctness_conv1d since it causes segment fault.
Root cause need more investigation (maybe due to cudnn frontend as
well).
* Skip test_aten_attention since it causes assert failure. Root cause
need more investigation (maybe due to torch version).
* Skip orttraining_ortmodule_distributed_tests.py since it has error
that compiler for torch extension does not support c++17. One possible
fix it to set the following compile argument inside setup.py of
extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17'].
However, due to the urgency of unblocking the pipelines, just disable
the test for now.
* skip test_softmax_bf16_large. For some reason,
torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so
the test was run in CI, but V100 does not support bf16 natively.
* Fix typo of deterministic

<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@sumitsays sumitsays force-pushed the user/sumita/cherrypick-pad branch from 0ef68f2 to b8670e0 Compare August 9, 2024 14:27
@sumitsays sumitsays requested a review from prathikr August 9, 2024 14:37
fdwr
fdwr previously approved these changes Aug 9, 2024
Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pad changes look fine to me. Deferring to others for the non-pad changes (manifest, YAML, QNN).

snnn
snnn previously approved these changes Aug 9, 2024
@sumitsays sumitsays dismissed stale reviews from snnn and fdwr via 46da823 August 9, 2024 23:33
@sumitsays sumitsays requested a review from pranavsharma August 9, 2024 23:38
@sumitsays sumitsays changed the title [ORT 1.18.2] Cherry Pick Pad Optimizations [ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 Aug 9, 2024
snnn
snnn previously approved these changes Aug 9, 2024
@sumitsays sumitsays merged commit f4f4953 into rel-1.18.2 Aug 12, 2024
88 of 103 checks passed
@sumitsays sumitsays deleted the user/sumita/cherrypick-pad branch August 12, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants