Make G2G as default during the compilation... #874

alazzaro · 2024-12-09T20:32:17Z

.. and control it at runtime via DBCSR_USE_ACC_G2G env variable (OFF by default).

alazzaro · 2024-12-09T20:32:43Z

FYI @gsitaram

src/mm/dbcsr_mm.F

hfp · 2024-12-10T08:37:19Z

Quick Q, is G2G only hinging on GPU-aware MPI?

alazzaro · 2024-12-10T08:39:36Z

Quick Q, is G2G only hinging on GPU-aware MPI?

Yes, exactly.
The feature is now promoted to be "a runtime flag", but it is still experimental (there are few things to consider). It will be officially released in 2025...

hfp · 2024-12-10T08:46:15Z

Quick Q, is G2G only hinging on GPU-aware MPI?

Yes, exactly. The feature is now promoted to be "a runtime flag", but it is still experimental (there are few things to consider). It will be officially released in 2025...

Thanks!

I see __DBCSR_ACC_G2G also requires to calculate norms on GPU (to keep data in place). In general, I wonder if G2G would work in any case or if there are missing transfers. Without norms on GPU, I can think some transfers are missing, but is there anything else?

I consider implementing norms on GPU for OpenCL too. I think all contemporary MPIs have GPU support if say pointers are registered, etc.

alazzaro · 2024-12-10T09:37:21Z

I see __DBCSR_ACC_G2G also requires to calculate norms on GPU (to keep data in place). In general, I wonder if G2G would work in any case or if there are missing transfers. Without norms on GPU, I can think some transfers are missing, but is there anything else?

The short answer is: no, it would not work in any case, that's why it is still experimental. For this reason, I've added 31dc51a. Indeed, there can be cases where the kernels are too big (for instance, 50x50), so no kernel will be jitted and the library will fall-back to the CPU (without host data!) and fail...

The norms are only one part of the story (and only relevant when we do apply filtering). Another thing @gsitaram introduced: with G2G we move the B-transposed data, so we dont' need to run B-transpose for each step of the multiplication.

Overall, the speed-up on LUMI was quite significant for the H2O-DFT-LS...

hfp · 2024-12-10T10:34:34Z

Overall, the speed-up on LUMI was quite significant for the H2O-DFT-LS...

Good to know! I will try to get my way through it at some point.

Make G2G as default during the compilation

0f960f1

gsitaram reviewed Dec 9, 2024

View reviewed changes

src/mm/dbcsr_mm.F Show resolved Hide resolved

Add a check when kernel are evaluated on the CPU

31dc51a

alazzaro merged commit a17f5d1 into develop Dec 10, 2024
22 checks passed

alazzaro deleted the compile_g2g branch December 10, 2024 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make G2G as default during the compilation... #874

Make G2G as default during the compilation... #874

alazzaro commented Dec 9, 2024

alazzaro commented Dec 9, 2024

hfp commented Dec 10, 2024

alazzaro commented Dec 10, 2024

hfp commented Dec 10, 2024 •

edited

Loading

alazzaro commented Dec 10, 2024

hfp commented Dec 10, 2024

Make G2G as default during the compilation... #874

Make G2G as default during the compilation... #874

Conversation

alazzaro commented Dec 9, 2024

alazzaro commented Dec 9, 2024

hfp commented Dec 10, 2024

alazzaro commented Dec 10, 2024

hfp commented Dec 10, 2024 • edited Loading

alazzaro commented Dec 10, 2024

hfp commented Dec 10, 2024

hfp commented Dec 10, 2024 •

edited

Loading