Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rocfft][cufft] DFT update host task to use native command #578

Merged
merged 11 commits into from
Oct 14, 2024

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Oct 2, 2024

Description

Similar to #572 (see the discussions in that PR for technical details) except this covers fft backends for both amd and nvidia cases.

Update host task impl to use enqueue_native_command for DFT using the cuda/hip backends.

tests:

test_main_dft_ct_amd.txt
test_main_dft_rt_amd.txt
test_main_dft_rt_nvidia.txt
test_main_dft_ct_nvidia.txt

author: @hjabird

@JackAKirk
Copy link
Contributor Author

@oneapi-src/onemkl-dft-write could you please review this?

Thanks

Copy link
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Note we have used this in 2 GROMACS benchmarks to get between 2% to 7% improvements on MI210.

src/dft/backends/rocfft/execute_helper.hpp Outdated Show resolved Hide resolved
remove whitespace

Co-authored-by: Romain Biessy <[email protected]>
@JackAKirk
Copy link
Contributor Author

@anantsrivastava30 Is this OK to merge?

Thanks

Comment on lines 162 to 167
#ifdef SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND
cgh.ext_codeplay_enqueue_native_command([=](sycl::interop_handle ih){
#else
cgh.host_task([=](sycl::interop_handle ih){
#endif
f(std::move(ih));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed to duplicate this in both cuFFT and rocFFT backends and in various domains (BLAS, LAPACK, FFT)? Can't we have one wrapper used across all domains and backends?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the domains have always been separated on purpose. This makes review much easier as any change affecting common code would technically require an approval from every domain owners.
I agree this could be discussed in an issue. To my knowledge there is very little code that could be common across domains, other than the types and exceptions which are already common.

@JackAKirk
Copy link
Contributor Author

Hi @lhuot

I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

@lhuot
Copy link
Contributor

lhuot commented Oct 11, 2024

Hi @lhuot

I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

@JackAKirk
Copy link
Contributor Author

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends?
I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

@lhuot
Copy link
Contributor

lhuot commented Oct 11, 2024

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

Sounds like a reasonable approach to me. Thanks!

@JackAKirk
Copy link
Contributor Author

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

Sounds like a reasonable approach to me. Thanks!

I've updated this now as requested. Tests still pass as posted in PR summary on both hip and cuda backends.

Copy link
Contributor

@lhuot lhuot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Rbiessy Rbiessy merged commit 058ee95 into uxlfoundation:develop Oct 14, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants