-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward convolution does not end #63
Comments
@aryamazaheri what is your environment? |
I have an AMD RX580 GPU with rocm backend installed. Do you need further information? |
@aryamazaheri can you try this again with the latest MIOpen and ROCm versions? 1.9.2 is the latest ROCm. |
I pulled the latest version from github and compiled again. The first time that I ran the command, I got the following error.
I tried to run again and I see the same issue still exists. The execution doesn't end. Does this command run for you without any problem? |
@aryamazaheri Can you give us a description of your system? What is your OS and version, and what is the ROCm version you are running. We are investigating this issue. |
Inactivity. |
e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e
* remove datatype 0,1 from perf_db * rm invalid fp16 entries from pdb * Squashed 'fin/' changes from 53d2563fe..e05dcb421 e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e * fix clang-format issue Co-authored-by: Jun Liu <[email protected]>
49e3e3a62 clang format db80b1777 update to using TestPerfCfgParams for pdb validity checks e48a4fd3a format a4f85842c exception for non-tunable solvers in params check d58c42bbd Check params at end of perf tuning (#70) 1a3b47c7b Return status for failed compile commands (#69) d59962752 out_layout -> in_layout 6ba7a8f3f Rename conv_mode to mode (#64) 513a3da1b [bg/LWPTUNA-173] (#65) e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 49e3e3a62a7cc54adacbeea95680d35f9a4685de
* [Doc] Bump rocm-docs-core from 0.30.0 to 0.30.1 in /docs/sphinx (#2589) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.0 to 0.30.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.0...v0.30.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [DOC] Doxygen change: enable warning as error msg and add missing API comments (#2585) * AI Based Parameter Prediction Model for conv_hip_igemm_group_fwd_xdlops Solver (#2523) * [HotFix] KDB Files should not be in the runtime package (#2591) * [Doc] Adding issue template (#2590) * [Doc] Add documentations for non-packed tensors convolution (#2537) * edit document of convolution * address comments --------- Co-authored-by: Jun Liu <[email protected]> * [Doc] Fix broken links in README.md (#2595) * Add nightly update workflow (#2579) * Tests for RNN seq API (#2493) * [HotFix] Fix Windows build with disabled CK (after #2523) (#2598) * Properly guard CK usage by MIOPEN_USE_COMPOSABLEKERNEL defines * Update src/solver/conv_hip_implicit_gemm_grouped_fwd_xdlops.cpp Co-authored-by: Artem Tamazov <[email protected]> --------- Co-authored-by: Jun Liu <[email protected]> Co-authored-by: Artem Tamazov <[email protected]> * [MIOpenDriver] Enabled gemmfp16. [tests] Added smoke test for fp16 and fp32 gemm. (#2592) * fix-gemmfp16(01) [MIOpenDriver] Enable gemmfp16 in the driver * fix-gemmfp16(02) [tests] Add smoke test for fp16 gemm * [Doc] Fix URLs (ROCmSoftwarePlatform -> ROCm) in the doc, comments, and code. + more (#2597) * Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code. * (2) Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code. * Fix incorrect link * Fix links * [HotFix] Bump CK commit hash for F8 patch (#2603) * [Doc] Fix broken links in CONTRIBUTING.md (#2601) * Fix broken rocmsoftwareplatform.github.io links in CONTRIBUTING.md * Use new organization name for repoistory links * [Windows] use find_package() for Eigen and frugally-deep (#2574) * [Windows] enable compilation on Windows (#2570) * [HotFix] 3D Group Conv Backward data and weight update. Failure noticed when pads and strides are not 1 (#2560) * [CMake] fix find_package(... GLOBAL) for CMake < 3.24 (#2610) * [HotFix][atamazov] multiple undefined behavior discovered with -fsanitize=undefined in DEV builds (#2609) * fix-issue-2602(01) Fix for smoke_miopendriver_gemm * Do not print output parameters in MIOPEN_LOG_FUNCTION calls. --------- Co-authored-by: atamazov <[email protected]> * [hipRTC] resolve symbol issues by explicitly link with hipRTC (#2612) * explicitly link with hipRTC * Update formatting * Consider MIOPEN_USE_HIPRTC=Off * Clean up --------- Co-authored-by: Jun Liu <[email protected]> * Standardize workspace abstraction (#2524) * [gtest] conversion for code coverage tests (#2580) * [HotFix] revert #2580 and re-enable smoke tests (#2616) * Revert "[gtest] conversion for code coverage tests (#2580)" This reverts commit c5a2384dc0f29682ed51aeccf9b981dbdf7e058f. * re-enable smoke tests in CI * remove problematic github action * [Windows] use find_package() for SQLite3 (#2564) * [Doc] Bump rocm-docs-core from 0.30.1 to 0.30.2 in /docs/sphinx (#2620) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.1...v0.30.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Windows] use official ZStd package from Facebook (#2565) * Remove MIOpenGEMM and MIOpenTensile leftovers (#2499) * Remove FIN_OLD_PROBLEM_DESCRIPTION_COMPAT (#2503) * [Jenkins] Add NOMLIR stage. [Workaround] Limit usage of gfx908 nodes in non-nightly builds (#2622) * Get rid of legacy 2GiB offset limits in CallGemm*() and transpose*() internal APIs and kernels. (#2613) * [BugFix] Proper fix for backward passes bwd/wrw for CK group conv 3d (#2619) * [BugFix] asm igemm fwd kernel will have computation error when c <=4 and dilation_y > 1, workaround (#2625) * Fused solver for Fwd Convolution with Residual add, Bias add and then activation function (#2517) * Bump MIOpen version to 3.1.0 and update CI docker (#2519) * [HotFix] resolve unknown type issue after #2517 (#2629) * [Doc] Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#2628) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517. (#2630) * [Jenkins][Tests] Add stage with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517 #2630. (#2631) * [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517. * add -DMIOPEN_USE_COMPOSABLEKERNEL=Off stage * make NOCK stage anyAPU and build ONLY * Adopt recommended changes * rename config_targets to make_targets * Extend GTest DISCOVERY_TIMEOUT to 5 mins * [Tests] add unit test for #2624 (#2632) * [gtest] Combine gtests into single binary. (#2599) * [Windows] rocblas: disable Beta API on Windows for HIP < 5.7 (#2405) * [tests] Limit applicability of ConvFwdBiasActivAPI/ConvFwdBiasResAddActivTest.ConvFusedAPI (#2635) * [Tests] helper for evn variables update in gtests (#2605) Co-authored-by: xinlipn <[email protected]> * [Windows] fix compilation of math functions on Windows (#2568) * [Windows] fix printf type incompatibility between type specifiers (#2569) * Fix miopen package dependency roctracer etc (#2508) * [Doc][NFC] added rocm v6, mi300, and default component (#2618) * [Windows] add a class to allow os-agnostic process execution (#2567) * [Windows] make BZip2 a required package (#2566) * [Windows] add missing symbol export (#2556) * add missing symbol export * more missing exports * fix format issues --------- Co-authored-by: Jun Liu <[email protected]> Co-authored-by: Alex Eremin <[email protected]> * [ROCm 6.1][hipRTC] Fix build failures. [quality] Reorg standard includes in HIP sources. (#2637) * [WORKAROUND] Disable W/A for issue #1359 starting from ROCm 5.4.3. (#2225) Co-authored-by: Jun Liu <[email protected]> * [Dep] Bump CK commit hash for staging (#2640) * [Windows] default paths to user and system db files on Windows (#2365) * Fix COMgr dependency in MIOpen package (#2645) * [ROCm 6.0.1] Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors. (#2644) * Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors (#2465 + more) (cherry picked from commit 4f695d975a2a6de2f167fc2925f3bad79fbaaf98) * Remove duplicated includes. * [HOTFIX] Adapt to changes in HIP Mainline 417 (possibly future 6.1 RC) (#2652) * fix-rocm61rc417(01) Disable new kernel build warnings. [NFC] Sort headers properly. * fix-rocm61rc417(02) [ROCm 6.1][HIPRTC] Use custom implementations instead of standard <limits>. This fixes build issues with ROCm 6.1. * fix-rocm61rc417(03) [ROCm 6.1][HIPRTC][Bugfix] Fixed issue in miopen_limits.h that prevented the use of custom implementations. * fix-rocm61rc417(04) [ROCm 6.1 RC][HIPRTC] Disable some of the custom implementations from <type_traits> (like `integral_constant`) for HIP mainline 417. This fixes some build issues. * fix-rocm61rc417(05) [ROCm 6.1 RC][offline compiler] Removed "-mcpu" from build options. This resolves kernel build issues with HIP mainline 417 (offline compiler). Improved diagnostic messages output onto console after offline build failures. * fix-rocm61rc417(06) [tests] Disable some testcase from handle_test as #2600 still persists in Hip Mainline 417. --------- Co-authored-by: Jun Liu <[email protected]> * Correct parameter which prints unused flag in log fusion cmd (#2653) * [MI300][Tuning] Tunings for SWDEV tickets (#2654) * add initial tunings for mi300 * add test to db_sync * [ROCm 6.0.1] Fix merge error in #2652 that affects #2644. (#2658) * [CK] Bump CK commit hash for staging (#2659) * Bump gitpython from 3.1.37 to 3.1.41 in /docs/sphinx (#2662) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.37 to 3.1.41. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.37...3.1.41) --- updated-dependencies: - dependency-name: gitpython dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2666) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Doc] Updated links to ROCm Repositories (#2667) Changed <old-organization> to "ROCm". * [SWDEV-433582] Search-proofed PrepareInvoker (#2661) * [HotFix] fix clang format issue from #2661 * [FIN] update submodule (#2660) * [Windows] replace [[gnu::noreturn]] with [[noreturn]] (#2656) * [Windows] addkernels: fix operations on path for Windows (#2657) * [Windows] clean up the setting of environment variables cross-platform (#2655) * clean up the setting of environment variables cross-platform * fix clang-tidy * Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2676) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Argmax enhancement in case of inner dim reduce (#2583) * [Test] Convert conv_igemm_dynamic_dlops etc to gTest (#2553) * [Bugfix] Restore Missing ctests (#2649) * [Windows] fix compilation on Windows (#2677) * [Windows] cmake: unpack kernels into a build directory (#2347) * Remove FIN_OLD_HANDLE_COMPAT and FIN_OLD_BINARY_CACHE_COMPAT (#2627) * Rename transpose* kernels (leftover of #2613) (#2673) * [CK] Bump CK commit hash for staging (#2683) Update CK to the latest staging * [zlib] Update rocm-recipes for more reliable zlib link (#2686) * [OCL] Use OpenCL 2.0 while compiling kernels (#2691) * Fix compilation on SELS/RHEL after #2657 merged (#2690) * [BF16][FP8][BF8] Fixed some specializations from `<limits>` and `<cmath>` (#2669) * conv::ProblemDescription: remove underscores, change return data type (#2685) * Add 2D Group Convolution Backward Data and Weights update solvers. Simplify and unify 3d group conv tests (#2663) * [HOTFIX] Disable "granularity loss" W/A for #2492 and add a new, "tiny tensor" based one. (#2695) * disable 2492 granularity_loss workaround and enable tiny_tensor workaround * workaround_issue_2492_02(01) Macros to uppercase. Add doc for WORKAROUND_ISSUE_2492_TINY_TENSOR. Add conditions N<=4 and C<=4 to the "tiny tensor" W/A. Disable it during warmup, make it controllable by MIOPEN_DEBUG_WORKAROUND_ISSUE_2492. * Update src/solver/conv_winoRxS.cpp --------- Co-authored-by: Jun Liu <[email protected]> * [Clang-Format] Fix format issue * Bump rocm-docs-core from 0.31.0 to 0.32.0 in /docs/sphinx (#2699) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.32.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.32.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [MI300][Tuning] gold 20 (#2697) * add gfx942 superbench winograd tunings, update gold version to 20 * update with more superbench tunings * Remove support for ROCm < 5.6.0 (#2665) * Remove support for ROCm < 5.7.0 * deprecate-rocm-less-5.7(03) Leftover that fixes build error with "-Werror" * deprecate-rocm-less-5.7(04) Resolve review comment * Bump rocm-docs-core from 0.32.0 to 0.33.0 in /docs/sphinx (#2707) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.32.0 to 0.33.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.32.0...v0.33.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [COMGR] Use OpenCL 2.0. [HIPRTC] Provide min/max limits for int. Fix build errors related to min/max limits for BF16. (#2705) * fix-rocm-mainline-issues-01(01) Removeed `constexpr` from numeric_limits<hip_bfloat16>::min()/max() as BF16 ctor provided by HIP can't be used in const expressions. * fix-rocm-mainline-issues-01(02) [COMGR] Globally engage OpenCL 2.0 * fix-rocm-mainline-issues-01(03) [HIPRTC] Provide min/max limits for int * [DOC] fix broken links in docs (#2696) * lwpmiopen_521_correct_doc_issues: fix broken links in docs * lwpmiopen_521_correct_doc_issues: remove citing * [HotFix] Fix DB install after #2347 (#2702) --------- Co-authored-by: Artur Wojcik <[email protected]> Co-authored-by: Artur Wojcik <[email protected]> * Add GroupNorm forward operation (#2623) * fix not reporting LFS missing files (#2710) * [HotFix][WHL] move the bfloat16 header to the proper guard (#2711) * [HotFix] Update FindDB for finetuning (#2712) * [CK] Update CK commit in requirements.txt for staging (#2713) * [Tests] Fix Gtest single executable build issue (#2715) (#2717) Add the missing build job to Jenkinsfile Fix duplicate class name issue in Gtest * [Windows] Do not use HIP runtime headers on Windows (#2719) * don't use WORKAROUND_DONT_USE_CUSTOM_LIMITS on Windows * don't use workaround SWDEV_413293 on Windows * CI base docker updates to ROCm 6.0.2 (#2714) * Softmax ocl refactoring (#2671) * Add cat forward operation (#2562) * [HotFix] Fix namespace conflict issue in gtest after #2562 (#2725) * Bump cryptography from 41.0.6 to 42.0.0 in /docs/sphinx (#2729) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/41.0.6...42.0.0) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [DB Install] fix installation of *.fdb.txt and *.db files (#2728) * Update CHANGELOG.md (#2720) * bg/update_change_log_lwpmiopen_501: update change long till rocm 6.1.0 (MIOpen-3.1.0) * bg/update_change_log_lwpmiopen_501: remove typo * bg/update_change_log_lwpmiopen_501: fix broken link * bg/update_change_log_lwpmiopen_501: second attempt to fix hyper link * Create placeholder CODEOWNERS (#2718) Add @JehandadKhan and @junliume as CODEOWNERS. * [Solvers] Fix for #2663 ensure tensor dimensions are consumed by solvers correctly (#2716) * [DOC] Add codeowners for documentation (#2692) * Add codeowners for documentation * Update CODEOWNERS --------- Co-authored-by: samjwu <[email protected]> Co-authored-by: Jun Liu <[email protected]> * Bump rocm-docs-core from 0.33.0 to 0.33.2 in /docs/sphinx (#2733) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.2. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix build after #2657 and #2690 (boost::filesystem) (#2732) * [Improvements] Replace HasAtLeastOne64BitTensor() with AllTensorsDimsFitIntoInt() (#2731) * Update CK-based 2d/3d convolution solvers to support nchw/ncdhw layout (#2429) * Bump rocm-docs-core from 0.33.2 to 0.34.0 in /docs/sphinx (#2739) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [BugFix] Set System KDB journal_mode to Off (#2724) * [Tests] Converting test_conv3d_extra into GTest (#2554) * [Tests] Convert test_rnn_vanilla , test_gru, test_rnn_extra and test_gru_extra gTests (#2550) * [Doc] Removing unmaintained release notes (#2745) * [CK] Update CK commit in requirements.txt for staging (#2747) * [Tests][gtest] conversion for LSTM (#2545) * Fix for issue #2734: Detect if "-fno-offload-uniform-block" works in HIP compiler. (#2743) * fix-issue-2734 (01) Use "-fno-offload-uniform-block" only if HIP compiler supports it. Resolves #2734. (cherry picked from commit 458c8338175383a95a5c3f30c726798828f15ea8) Partially changes code from PR #2719 "Do not use HIP runtime headers on Windows" # RESOLVED Conflicts: # CMakeLists.txt * fix-issue-2734(02) Removed W/A from PR #2719 as it is no longer needed. * Enable softmax solver based on attention-softmax implementation (#2737) * [Tests] Replace test_conv_igemm_dynamic_xdlops_bwd with gtest (#2409) * [Tests] Convert ctest to gtest for test_conv_for_implicit_gemm (#2513) * [Tuning][MI300] for m9 tickets (#2754) * [hipRTC] add lowest() for float to MIOpen custom limits (#2753) * [hipRTC] add lowest() to MIOpen custom limits * the earliest trace can be found together with numeric_limits<int> * [Linux] Enhance Compiler flags to avoid Hardcoded ROCm Path (Part 1) (#2694) * Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx (#2755) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.0...v0.34.2) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump cryptography from 42.0.0 to 42.0.2 in /docs/sphinx (#2759) Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/42.0.0...42.0.2) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HotFix] enable 2d grouped fwd convolution support on mi300 (#2761) * enable support on mi300 * Fix missing include files * Fix header needed even for non-ck build --------- Co-authored-by: Jun Liu <[email protected]> * Bump cryptography from 42.0.2 to 42.0.4 in /docs/sphinx (#2765) Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/42.0.2...42.0.4) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Implemented preparsing sqlite db to text format (#2722) * Bump rocm-docs-core from 0.34.2 to 0.35.0 in /docs/sphinx (#2768) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.2 to 0.35.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.2...v0.35.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HotFix] Fixed incorrectly generated files (#2769) * Adding Link Dependencies to resolve missing symbols from pthread and dl referenced by sqlite (#2773) * Adding library dependency dl for dlopen * Adding link dependency to pthread * RNN Inference MS (#2727) * [Tuning][MI300] Find db update - Superbench/Winograd (#2780) * [HotFix] fix failed error bugs in conv backward weight solvers (#2770) * fix failed error bugs in 2d/3d conv backward weight solvers * fix time issue in NCHW layout invoker * code refactoring: define hip event profiler to reduce code duplicate * delete comments * fix tidy error * address comments * [CK] Bump CK commit hash for staging (#2784) * Minor softmax improvements (#2782) * using ostream instead of concatanation of strings * Problem description slightly changed. softmax driver patched * Remove SetTensorLayout (#2787) * Add heuristics tests for gfx90a architecture (#2772) * Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx (#2791) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.0 to 0.35.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HotFix][Format] Fix clang-format issue with tuna_net update * [Tests] Convert test_conv_group, test_conv_extra and test_conv_3d to gTests (#2767) * Convert test_conv_group to gTest * Convert test_conv_extra and test_conv_3d to gTest * Fix build * [Windows] Fixing linking issue for sqlite2txt on Windows (#2793) * MI300 TunaNet Integration (#2795) * Dynamic workspace calcuation (#2779) * calculate workspace size for winning solution at runtime * update GetWorkSpaceSize to use solver workspace query instead of reading db * fix clang-format issues --------- Co-authored-by: Christopher Erb <Christopher.Erb@amd> Co-authored-by: Jun Liu <[email protected]> * RNN back weights update (#2794) * [CI] Enabling navi32 Testing Stages (#2796) * [HotFix] Changed text perfdbs to be actually installed when enabled #2722 (#2800) * Bump CK commit hash for staging and update CI docker (#2777) * [Windows] Upgrade class TmpDir (#2762) * Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx (#2801) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HotFix] Fix unpackdb after merging #2800 (#2802) * [HIPRTC] Provide option to add/remove include directories to/from compiler flags (#2764) * Provide option to add/remove include directories to/from compiler flags The hip compiler flags are getting embedded in MIOpen shared library and the isystem include directories in the compiler flags are hard coded paths. For the ROCm use case, build scripts will set the option to OFF, so that include directories will not be added to compiler flags. This will help in removing the hard coded path from the library By default the option is set to ON. * Set the defualt value of the option MIOPEN_HIP_COMPILER_USE_SYSTEM_INCLUDE_DIRECTORIES based on HIPRTC compiler usage * Check HIP version as well to enable/disable the use of system include directories in hip compiler flags Use system include directories if hip version is less than 6.1.40091 * [CI][test-perf] MIOpenDriver to use rocrand to init buffers. Do not init output buffers. Use non-DEV build in perf test. (#2785) * [Windows] Fix MIOpenDriver linking with rocRand (#2820) * Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx (#2827) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Offline Compiler] Update Target Link Dependency (#2815) * Softmax for find20 (#2776) * Implement Tensor Descriptors for MIOpen Backend API (#2751) * Doc cleanup (#2783) * [CK] Update requirements.txt for next staging (#2824) * [WORKAROUND] unblock compilation on Windows after merging #2751 (#2832) * fix Windows compilation after #2751 * fix clang-format --------- Co-authored-by: Jun Liu <[email protected]> * [Tests] Add client component with test package and fix single test binary not start (#2806) * add client component with test package * Fix single test binary not start --------- Co-authored-by: Jehandad Khan <[email protected]> * [Windows] Adapt logging functionality to Windows (#2804) * fix logging on Windows * fix clang-format * display the correct MIOpenDriver executable name * [OCL] patch Softmax issue (#2268) * Implement MIOPEN_BACKEND_VARIANT_PACK_DESCRIPTOR builder (#2847) * For CK solvers change PerfConfigBase to PerfConfigBaseCK (#2834) * For ck solvers change PerfConfigBase to PerfConfigBaseCK * remove Find() from structs derived from PerfConfigBaseCK * Bump rocm-docs-core[api_reference] from 0.37.0 to 0.38.0 in /docs/sphinx (#2852) Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.38.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.38.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Find 2.0 must not autoreset buffers (#2836) * Remove legacy prng leftover (#2853) * TunaNetv2.0 for MI300 (#2835) * Add alignment to the workspace pointer passed to the reduction kernel (#2822) * Add alignment to the workspace pointer passed to the reduction kernel * Use cacheline size for pointer alignment and uintptr_t for portable integer/pointer conversion * Reformat using clang-format-12 * Use std::align to align the workspace pointer * Helping to resolve @atamazov comments as soon as possible * missing check --------- Co-authored-by: Shurale-nkn <[email protected]> * [MHA] Implement MIOPEN_BACKEND_RNG_DESCRIPTOR (#2861) * Rename files removing unnecessary graphapi_ prefix * Add missing enums for MIOPEN_BACKEND_RNG_DESCRIPTOR * Add common header for Graph API tests * Add GTest executer for Graph API * Implement MIOPEN_BACKEND_RNG_DESCRIPTOR Builder * Implement MIOPEN_BACKEND_RNG_DESCRIPTOR API class * Fix missing pragma once * [MI210][Tuning] UNet3D (#2859) * [Tests] Convert three regression tests to gTests (#2810) * [Windows] use standard C++ streams to access files (#2807) * Forward MHA find2.0 interface and implementation (#2819) * [MHA] Implement MIOPEN_BACKEND_POINTWISE_DESCRIPTOR (#2854) * [Workaround][Issue #2867] Disable iGEMM kernels for corner configuration (#2869) * Find 2.0 scalar run-time parameters (#2826) * Find 2.0 scalar run-time parameters * Update include/miopen/miopen.h Co-authored-by: Artem Tamazov <[email protected]> * Added miopenTensorArgumentIsScalar value serialization * Fixed build after renaming enum field * format * tidy fix --------- Co-authored-by: Artem Tamazov <[email protected]> * [MHA] Implement convolution descriptors in graph (backend) API (#2792) * [Windows] Use std::vector<char> for binary blobs (#2805) * use std::vector<char> for binary blobs * fix clang-format issues * incorporate review feedback * incorporate review feedback * Update submodule FIN * suppress warning in clang-tidy --------- Co-authored-by: Jun Liu <[email protected]> * Skip fusions tests when xnack is enabled (#2870) * [MHA] Implement MIOPEN_BACKEND_REDUCTION_DESCRIPTOR (#2862) * [MHA] CPU multi head attention (#2563) * lwpmiopen-230 : first attempt to cpu implementation of multi head attention fwd * lwpmiopen-230 : fix indexing issue * bg/lwpmiopen-230_cpu_multi_head_attention : fix clang format * bg/lwpmiopen-230_cpu_multi_head_attention : output M and Z_inv * bg/lwpmiopen-230_cpu_multi_head_attention : added gtest, used tensor * bg/lwpmiopen-230_cpu_multi_head_attention: fix review comments and change function names * bg/lwpmiopen-230_cpu_multi_head_attention : now able to have result exact as pytorch * bg/lwpmiopen-230_cpu_multi_head_attention: move helper functions to mha_helper.hpp * create helper filer for mha * bg/lwpmiopen-230_cpu_multi_head_attention: f32 and fp8 mha computed * bg/lwpmiopen-230_cpu_multi_head_attention: cleanup * minor cleanups * bg/lwpmiopen-230_cpu_multi_head_attention: comment cleanups * bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer * bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer * bg/lwpmiopen-230_cpu_multi_head_attention: add softmax function * bg/lwpmiopen-230_cpu_multi_head_attention: add attention json golden data * bg/lwpmiopen-230_cpu_multi_head_attention: fix CI issue * bg/lwpmiopen-230_cpu_multi_head_attention: test passing * bg/lwpmiopen-230_cpu_multi_head_attention: fixed clang format * bg/lwpmiopen-230_cpu_multi_head_attention: fix clang format * bg/lwpmiopen-230_cpu_multi_head_attention: fix path of attention_golden.json * bg/lwpmiopen-230_cpu_multi_head_attention: moved test data from json to hpp * Initial commit. solver infrastructure's classes are introduced * some raw code added * mha descriptor file added * format clang run * remove homegrown bitcast * use fp32 functions explicitly * add atomic final reduction step * add dropout part * add final scaling * add mha solver (no dropout initialization) * return scaling back * clang run + some changes * enum values change * tidy check fixes * format run * cpp check fix, cmakelist fix * comment fix * properly use rocblas * fix clang-tidy * bg/lwpmiopen-230_cpu_multi_head_attention: increase tolerance * scalars changed to tensors * warning fix * compilation fix after merge * compilation (after merge) fix * buffers removed from desks struct * tidy fix * fix descaling for softmax * use miopen gemm * use MultiBufferWorkspaceTraits * cpu code refactoring * fix format * remove legacy prng leftover * Find 2.0 must not autoreset buffers * try to fix clang-tidy false-positve * make cpu_mha more consistent with the docs and fix operation order * fix format * make cpu_mha 30% faster, remove unused headers * use std::max for cpu mha instead of explicit conditions * bg/lwpmiopen-230_cpu_multi_head_attention : remove typo --------- Co-authored-by: Bibek Ghimire <[email protected]> Co-authored-by: Vsevolod Golovko <[email protected]> Co-authored-by: Aleksandr Eremin <[email protected]> * MHA Forward Find 2.0 Wrapper Test (#2872) * [MHA] Implement MIOPEN_BACKEND_VARIANT_PACK_DESCRIPTOR API Class (#2858) * [Driver][NFC] Modular: Split MIOpenDriver to improve build time (#2856) * [Windows] add filesystem utility functions (#2823) Co-authored-by: Alex Eremin <[email protected]> * [Tests] Convert reduce tests to gTests (#2848) * Convert reduce tests to gTests * Refactor with initialization list to disable warnings * [MHA] Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR Builder (#2879) * Keep source attribute types for Pointwise without converting to double * Add swish beta to pointwise attributes * Apply naming rules * Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR Builder * Remove extra db path quotes introduced in #2823 (#2884) * Bump rocm-docs-core[api_reference] from 0.38.0 to 0.38.1 in /docs/sphinx (#2887) Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [MHA] Added forward numeric test. Fix some bugs in cpu and gpu implementations. Resolved some post-merge comments. (#2875) * [MHA] Implement MIOPEN_BACKEND_OPERATION_RNG_DESCRIPTOR (#2873) * Bump idna from 3.6 to 3.7 in /docs/sphinx (#2889) Bumps [idna](https://github.com/kjd/idna) from 3.6 to 3.7. - [Release notes](https://github.com/kjd/idna/releases) - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst) - [Commits](https://github.com/kjd/idna/compare/v3.6...v3.7) --- updated-dependencies: - dependency-name: idna dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [conv][FP32] Extend applicability of GemmBwdRest and GemmFwdRest for big WS sizes (#2811) * gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(01) Reorganized MaxMemAllocSz() code. Formalized WORKAROUND_MLOPEN_ISSUE_1430. Added MIOPEN_WORKAROUND_ISSUE_2808, MIOPEN_WORKAROUND_ISSUE_2809. * gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(02) [conv][gemm] Common code from GEMM solvers moved to the solver/gemm_common module. * gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(05) [driver][debugging] Add logging of hipMalloc/Free * gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(06) [conv][gemm] Removed MIOPEN_WORKAROUND_ISSUE_2808/2809. Introduced MIOPEN_WORKAROUND_ISSUE_2789 that affects GemmFwd/BwdRest solvers only, and only for FP32. * gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(08) Fix tidy issues * Update Depends with correct HIP Runtime package name (#2871) * [MHA] Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR (#2880) * Apply member naming rule * Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR Builder * Introduce checkPtr common function * Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR C API Class * [CK] Update requirements.txt for next staging (#2877) * [CK] Update requirements.txt for next staging * update CK commit hash * update CK commit hash * [MHA] Implement Matmul descriptor for MIOpen Backend API (#2882) * [MHA] Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR API Class (#2886) * Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR C API Class * Introduce checkPtr common function * Use checkPtr common function * Implement graph node signatures * Graph API: Operation Graph creation and interface (#2818) * [CI] removing gfx908 and vega builds node from smoke tests (#2876) * Unify 'include half.hpp' between Windows and Linux (#2892) * [MHA] backward pass (#2895) * KernelTuningNet for MI300/200 ConvHipIGemmGrouped Solvers (#2898) * [Windows] Unify 'include amd_comgr.h' between Windows and Linux (#2899) * ConvProblemDescription: fix GetInSize(), GetOutSize() and GetWeightsSize() (#2896) * [Windows] make rocMLIR required package on Windows (#2903) * [NFC] Fix leftover of #2251 (Remove src/kernels/MIOpenCheckNumerics.cl) (#2901) * Graph API: Operation Graph matching (#2855) * WIP: graph creation and interface * WIP: add a test for op graph * WIP: tests for op graph * initial test works * Cleanup * formatting fixes * address comments * fix build * address comments * combine duplication of OpNode and fix up Convolution Operation classes * fix formatting * use `copy_n` instead of `copy`. Co-authored-by: Alex Eremin <[email protected]> * address comments * fix copy_n * Graph Matching algorithms and tests Squash commits WIP: implement matching tests for op graphs rebase on parent branch WIP: move helper functions out WiP: fix build initial tests for graph matching are passing. Some bug fixes to OpGraph class * fix tidy warnings * more matching tests and a dummy graph generator * fix hip tidy warnings * add throw for tensors names that exceed 8 chars * add inline to avoid duplicate function warning --------- Co-authored-by: Alex Eremin <[email protected]> * [MAH] [test] mha CPU backward test (#2829) * lwpmiopen-230 : first attempt to cpu implementation of multi head attention fwd * lwpmiopen-230 : fix indexing issue * bg/lwpmiopen-230_cpu_multi_head_attention : fix clang format * bg/lwpmiopen-230_cpu_multi_head_attention : output M and Z_inv * bg/lwpmiopen-230_cpu_multi_head_attention : added gtest, used tensor * bg/lwpmiopen-230_cpu_multi_head_attention: fix review comments and change function names * bg/lwpmiopen-230_cpu_multi_head_attention : now able to have result exact as pytorch * bg/lwpmiopen-230_cpu_multi_head_attention: move helper functions to mha_helper.hpp * create helper filer for mha * bg/lwpmiopen-230_cpu_multi_head_attention: f32 and fp8 mha computed * bg/lwpmiopen-230_cpu_multi_head_attention: cleanup * minor cleanups * bg/lwpmiopen-230_cpu_multi_head_attention: comment cleanups * bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer * bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer * bg/lwpmiopen-230_cpu_multi_head_attention: add softmax function * bg/lwpmiopen-230_cpu_multi_head_attention: add attention json golden data * bg/lwpmiopen-230_cpu_multi_head_attention: fix CI issue * bg/lwpmiopen-230_cpu_multi_head_attention: test passing * bg/lwpmiopen-230_cpu_multi_head_attention: fixed clang format * bg/lwpmiopen-230_cpu_multi_head_attention: fix clang format * bg/lwpmiopen-230_cpu_multi_head_attention: fix path of attention_golden.json * bg/lwpmiopen-230_cpu_multi_head_attention: moved test data from json to hpp * bg/lwpmiopen-230_cpu_multi_head_attention: increase tolerance * bg/mha_back_fp8_lwp-502: mha back * bg/mha_back_fp8_lwp-502: create function check * bg/mha_back_fp8_lwp-502 : fix indentation * bg/mha_back_fp8_lwp-502: remove unwanted if check * bg/mha_back_fp8_lwp-502: remove unused variable * bg/mha_back_fp8_lwp-502: fix function name * bg/mha_back_fp8_lwp-502: implement mha bwackward fp8 * bg/mha_back_fp8_lwp-502: minor fix on args * bg/mha_back_fp8_lwp-502: fix CI issue * match implementation with the graph * fix typo in scaling tensor name --------- Co-authored-by: Bibek Ghimire <[email protected]> Co-authored-by: Aleksandr Eremin <[email protected]> * [NFC] Removed WORKAROUND_SWDEV_227826 macro and MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS envvar (#2816) * remove-wa-swdev-227826(01) Removed WORKAROUND_SWDEV_227826 macro and MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS envvar * remove-wa-swdev-227826(02) Removed leftover of MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS --------- Co-authored-by: Jun Liu <[email protected]> * [Windows] fix test include_inliner on Windows (#2908) * Fixes to support huge tensors. Enable huge tensors in ConvDirectNaive*. miopenSetTensorDescriptorV2 (BETA). (#2838) * Consider workspace constraints when loading solutions from DB (#2888) * [Tests] Make would fail with no device error without GPUs (#2909) * Fix make failed with no device error without GPUs * Add DISCOVERY_MODE PRE_TEST option in gtest_discover_tests so test binary will execute during runtime to discover the tests before actually running them * Set DISCOVERY_MODE to PRE_TEST in gtest_discover_tests() so test binary will execute during runtime to discover the tests before actually running them * Remove duplicated DISCOVERY_MODE option * [MHA] Implement MIOPEN_BACKEND_OPERATION_MATMUL_DESCRIPTOR (#2902) * Update CI docker and bump CK commit hash for staging (#2900) * [Windows] fix execution of a HIP compiler on Windows (#2905) * [MHA] Implement MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface (#2894) * [HOTFIX] Fix typo introduced by #2894 and #2902. (#2934) * Adjustments for the latest assembler (e.g. latest changes in the upstream clang) (#2891) * gcnasm-noxnack-etc(01) Remove -mxnack/mno-xnack from COMgr assembler * gcnasm-noxnack-etc(02) Added WORKAROUND_ROCMCOMPILERSUPPORT_ISSUE_67 for the "-nogpulib" warning during assembly via COMgr * gcnasm-noxnack-etc(03) Removed "-mno-xnack" from the offline (clang) amdgcn assembly path. * [Tests] Remove extra - in paramter to fix reduce tests (#2935) * [Doc] Fix extra space in doc link (#2937) * [Doc] Fix docs structure and broken link in Log & Debug and Argmax (#2944) * Fix link to rocBLAS programmer guide * Fix Argmax docs in doxygen. Update reference/index.rst and remove unused argmax.rst. - Adds Argmax (experimental) to the list of all modules in documentation. - Gives Argmax documentation formatting consistent with other API docs. * [gfx11][Solvers][Winograd] ConvWinoFuryRxS v2.4 (#2778) * [MHA] add test for the backward pass (#2929) * [Doc] Update LICENSE.txt to reflect all licenses used (#2758) - Fixes #2757 - Updates LICENSE.txt to reflect the following files which diverge, at least partially, from the repo's indicated MIT license BSD-2-Clause driver/mloSoftmaxHost.hpp BSD-2-Clause and MIT src/include/miopen/mlo_internal.hpp Apache-2.0 and MIT src/include/miopen/kernel_cache.hpp src/kernel_cache.cpp Public Domain (and MIT) src/md5.cpp * Bump tqdm from 4.66.2 to 4.66.3 in /docs/sphinx (#2949) Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3. - [Release notes](https://github.com/tqdm/tqdm/releases) - [Commits](https://github.com/tqdm/tqdm/compare/v4.66.2...v4.66.3) --- updated-dependencies: - dependency-name: tqdm dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Tests] Fixed test_perfdb and test_sqlite_perfdb to propertly use mutex (#2907) * Bump jinja2 from 3.1.3 to 3.1.4 in /docs/sphinx (#2951) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [MHA] Implement several graph API descriptors (#2919) * Implement MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface without tests * Fix empty-graph API violation * Add tests for MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface * Fix tests for MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API * Revert "Fix empty-graph API violation" This reverts commit 3e5092a6cdb823d5f9f3f555348f1f7b3aa77bb5. * Resolve the resulted list's TODO * Rename files enginefinder* to operationgraph_descriptor* * Fix a memory leak * Add builder for C++ MHA Forward end-to-end test * Fix a typo * Define ctors explicitly * Implement MIOPEN_BACKEND_ENGINE_DESCRIPTOR without tests * Implement MIOPEN_BACKEND_ENGINECFG_DESCRIPTOR without tests * Combine OpGraph and OperationGraph * Implement part of MIOPEN_BACKEND_EXECUTION_PLAN_DESCRIPTOR without tests * Fix tests for opgraph * Fix tidy issues * [gTest] Reduce_custom_fp32 skips on MI200/gfx90a (#2948) * Fix reduce_custom_fp32 skips MI200 * Fix test_reduce_custom_fp32 skipped for test all * [Windows] Workaround conflicting definitions of std::min() MSVC and HIP Clang (#2952) * opgraph: fix compilation on Windows * Implement addlayernorm, T5layernorm (#2833) * [CK] Bump CK commit hash by updating requirements.txt (#2940) * [CK] Bump CK commit hash by updating requirements.txt * update CK commit hash for staging * [MHA] Implement MIOPEN_BACKEND_ENGINEHEUR_DESCRIPTOR (#2932) * Add adam and amp adam optimizer (#2868) * [Windows] fix NOGPU backend compilation (#2953) * fix NOGPU backend compilation on Windows * fix tidy format issue * incorporate review feedback --------- Co-authored-by: Jun Liu <[email protected]> * [Windows] unblock gtest tests discovering (#2904) * Reduce extreme (argmin, argmax, min, max etc) enhancement in case of inner dim (#2766) * Fix MIOpen throw message when MIOPEN_OFFLINE_COMPILER_PATHS_V2 is enabled (#2959) * Fix MIOpen THROW message when nogpu exists and fail to compile * Update src/hip/hip_build_utils.cpp Co-authored-by: Artem Tamazov <[email protected]> * fix clang-format issue --------- Co-authored-by: Artem Tamazov <[email protected]> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: mentat <[email protected]> Co-authored-by: Dmantri98 <[email protected]> Co-authored-by: JD <[email protected]> Co-authored-by: abhimeda <[email protected]> Co-authored-by: Daming Feng <[email protected]> Co-authored-by: Jun Liu <[email protected]> Co-authored-by: xu-shawn <[email protected]> Co-authored-by: Kamil Nasyrov <[email protected]> Co-authored-by: Alex Eremin <[email protected]> Co-authored-by: Artem Tamazov <[email protected]> Co-authored-by: Artur Wojcik <[email protected]> Co-authored-by: amberhassaan <[email protected]> Co-authored-by: Evgenii Averin <[email protected]> Co-authored-by: carlushuang <[email protected]> Co-authored-by: xinlipn <[email protected]> Co-authored-by: jasberc <[email protected]> Co-authored-by: David Galiffi <[email protected]> Co-authored-by: Vasilii Filippov <[email protected]> Co-authored-by: Seungman Han <[email protected]> Co-authored-by: Artur Wojcik <[email protected]> Co-authored-by: Kyeonghwan Ryu <[email protected]> Co-authored-by: scerzh <[email protected]> Co-authored-by: Vsevolod Golovko <[email protected]> Co-authored-by: Jungkeun Kim <[email protected]> Co-authored-by: Sam Wu <[email protected]> Co-authored-by: samjwu <[email protected]> Co-authored-by: Reid Kawaja <[email protected]> Co-authored-by: Saad Rahim (AMD) <[email protected]> Co-authored-by: M.Emin Ozturk <[email protected]> Co-authored-by: arvindcheru <[email protected]> Co-authored-by: Marek Grzegorek <[email protected]> Co-authored-by: urpetkov-amd <[email protected]> Co-authored-by: M. Saud Ul Hassan <[email protected]> Co-authored-by: Christopher Erb <Christopher.Erb@amd> Co-authored-by: raramakr <[email protected]> Co-authored-by: Lisa <[email protected]> Co-authored-by: Qianfeng <[email protected]> Co-authored-by: Bibek Ghimire <[email protected]> Co-authored-by: Alexey Akimov <[email protected]> Co-authored-by: Tal Ben-Nun <[email protected]> Co-authored-by: Seunghoon Lee <[email protected]> Co-authored-by: peter <[email protected]> Co-authored-by: tflink <[email protected]>
I tried running a forward convolution with the following parameters, but apparently it does not end on my system:
MIOpenDriver conv -F 1 -P 1 -n 64 -W 28 -H 28 -c 512 -k 512 -x 3 -y 3 -t 1 -p 1 -q 1 -u 1 -v 1 -V 0
Here is the log output:
The text was updated successfully, but these errors were encountered: