[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670

sumitsays · 2024-08-08T04:11:06Z

Description

This change cherry-picks 2 Pad fusion optimization: #21640 and #21556.

It also has to cherry-pick 2 extra changes to unblock pipeline and dependency failure: #21300 and #21662 (didn't include test which are part of 1.18.1 payload).

Also uploaded new version of onnxruntime_build_dependencies:10.177 and updated the same in download-deps.yml.

Additionally it also updates DML binary to 1.15.1.

Motivation and Context

Our macOS pipeline are failing because of a build error in absl. However, the bug fix we need is not available in the latest ABSL release. Here is the issue: abseil/abseil-cpp#1536 And here is the fix: abseil/abseil-cpp@779a356 GTests uses ABSL. But this ABSL target also depends on GTest. So, it is a circular dependency. We should be able to avoid that by avoid building tests for ABSL. However, the version we are using has a problem with that: it has cmake target that still depends on GTest even when testing is disabled. It's strange that we suddenly hit this problem and it only happens on macOS.

### Description This extends the existing pad_fusion for AveragePool operator i.e. fuse Pad if it is followed by AveragePool operator. ### Motivation and Context

snnn · 2024-08-08T15:44:33Z

(1) Run onnxruntime/cgmanifests/generate_cgmanifest.py
(2) Find out what is latest version: https://dev.azure.com/onnxruntime/onnxruntime/_artifacts/feed/onnxruntime/UPack/onnxruntime_build_dependencies/overview/1.0.176 and increase the version number to use in next step.
(3) Follow instruction in cmake/deps_update_and_upload.py to make changes to cmake/deps.txt and run the script to download dependencies and upload to azure.

### Description This change enhances the existing Pad Fusion to fuse Pad even if a Cast operator is present between Pad and Conv/MaxPool/AveragePool. It keeps the Cast as it is. <pre> /* * Before Fusion: * Pad * | * Cast (Optional) * | * Conv/MaxPool/AveragePool * * After Fusion: * Cast (Optional) * | * Conv/MaxPool/AveragePool */ </pre> ### Motivation and Context

* Fix migraphx build error caused by #21598: Add a conditional compile on code block that depends on ROCm >= 6.2. Note that the pipeline uses ROCm 6.0. Unblock orttraining-linux-gpu-ci-pipeline and orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline pipelines: * Disable a model test in linux GPU training ci pipelines caused by #19470: Sometime, cudnn frontend throws exception that cudnn graph does not support a Conv node of keras_lotus_resnet3D model on V100 GPU. Note that same test does not throw exception in other GPU pipelines. The failure might be related to cudnn 8.9 and V100 GPU used in the pipeline (Amper GPUs and cuDNN 9.x do not have the issue). The actual fix requires fallback logic, which will take time to implement, so we temporarily disable the test in training pipelines. * Force install torch for cuda 11.8. (The docker has torch 2.4.0 for cuda 12.1 to build torch extension, which it is not compatible cuda 11.8). Note that this is temporary walkround. More elegant fix is to make sure right torch version in docker build step, that might need update install_python_deps.sh and corresponding requirements.txt. * Skip test_gradient_correctness_conv1d since it causes segment fault. Root cause need more investigation (maybe due to cudnn frontend as well). * Skip test_aten_attention since it causes assert failure. Root cause need more investigation (maybe due to torch version). * Skip orttraining_ortmodule_distributed_tests.py since it has error that compiler for torch extension does not support c++17. One possible fix it to set the following compile argument inside setup.py of extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17']. However, due to the urgency of unblocking the pipelines, just disable the test for now. * skip test_softmax_bf16_large. For some reason, torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so the test was run in CI, but V100 does not support bf16 natively. * Fix typo of deterministic

…soft/onnxruntime into user/sumita/cherrypick-pad

fdwr

Pad changes look fine to me. Deferring to others for the non-pad changes (manifest, YAML, QNN).

…soft/onnxruntime into user/sumita/cherrypick-pad

snnn and others added 11 commits August 7, 2024 21:07

Update pad fusion pattern to include optional cast

093ddc3

Working pad fusion with cast

8765979

Removed commented codes

fde5d6c

Add comments

223f193

Indentation

e513c0e

Update to make lint happy

3f47658

Remove redundant casting the initializer of Pad.

fc94691

Addressed PR feedback

a20c9fe

Removed empty line

f17d908

sumitsays requested review from a team as code owners August 8, 2024 04:11

Update ORT build dependency artifact

4f79b4f

sumitsays requested review from snnn, jeffbloo and martinb35 August 8, 2024 21:59

snnn closed this Aug 8, 2024

snnn reopened this Aug 8, 2024

sumitsays changed the title ~~Expand Pad fusion~~ [ORT 1.18.2] Cherry Pick Pad Optimizations Aug 9, 2024

sumitsays and others added 3 commits August 9, 2024 07:05

Update onnxruntime dependency build artifact

b8670e0

sumitsays force-pushed the user/sumita/cherrypick-pad branch from 0ef68f2 to b8670e0 Compare August 9, 2024 14:27

sumitsays requested a review from prathikr August 9, 2024 14:37

sumitsays added 3 commits August 9, 2024 11:02

Merge branch 'user/sumita/cherrypick-pad' of https://github.com/micro…

fea20da

…soft/onnxruntime into user/sumita/cherrypick-pad

Disable orttraining pipeline

e967c7b

Disable orttraining-ortmodule-distributed pipeline

d74c2a2

fdwr previously approved these changes Aug 9, 2024

View reviewed changes

snnn previously approved these changes Aug 9, 2024

View reviewed changes

Update DML to 1.15.1

46da823

sumitsays dismissed stale reviews from snnn and fdwr via 46da823 August 9, 2024 23:33

sumitsays requested a review from pranavsharma August 9, 2024 23:38

sumitsays changed the title ~~[ORT 1.18.2] Cherry Pick Pad Optimizations~~ [ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 Aug 9, 2024

snnn previously approved these changes Aug 9, 2024

View reviewed changes

sumitsays added 2 commits August 9, 2024 17:11

Bump version number to 1.18.2

a7024d4

Merge branch 'user/sumita/cherrypick-pad' of https://github.com/micro…

9b56c20

…soft/onnxruntime into user/sumita/cherrypick-pad

sumitsays dismissed snnn’s stale review via 9b56c20 August 10, 2024 00:12

snnn approved these changes Aug 10, 2024

View reviewed changes

prathikr approved these changes Aug 10, 2024

View reviewed changes

pranavsharma approved these changes Aug 12, 2024

View reviewed changes

sumitsays merged commit f4f4953 into rel-1.18.2 Aug 12, 2024
88 of 103 checks passed

sumitsays deleted the user/sumita/cherrypick-pad branch August 12, 2024 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670

[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670

sumitsays commented Aug 8, 2024 •

edited

Loading

snnn commented Aug 8, 2024

fdwr left a comment

[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670

[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 #21670

Conversation

sumitsays commented Aug 8, 2024 • edited Loading

Description

Motivation and Context

snnn commented Aug 8, 2024

fdwr left a comment

Choose a reason for hiding this comment

sumitsays commented Aug 8, 2024 •

edited

Loading