Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel #21804

Merged
merged 3 commits into from
Aug 22, 2024

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Aug 20, 2024

Description

Use debug info to identify sdpa kernel actually used, and show it in the output of benchmark_mha.py. This updated benchmark script was used to get the benchmark results in #21629.
(1) Change the output format of debug info to output like SdpaKernel=*
(2) Add a step to capture stdout from onnxruntime session, and use regular expression to parse SdpaKernel=* from the captured text.

Other minor changes:
(1) Set different default repeats during benchmark: 100 for CPU; and 10000 for CUDA.
(2) Fix PrintTensorByDims used in console dumper: if it is not enabled, do not dump tensor.
(3) Update some comments

Motivation and Context

Sometime, we will use fallback for a sdpa_kernel. It could confuse user unless we can tell exact kernel is used in benchmark.

@tianleiwu tianleiwu merged commit 25d7a4f into main Aug 22, 2024
95 of 97 checks passed
@tianleiwu tianleiwu deleted the tlwu/benchmark_mha_kernel_from_debug_info branch August 22, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants