Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail of test 02_QP_PPA in LiF/GW-OPTICS with random parallelization using nvfortran #107

Open
sangallidavide opened this issue Jul 7, 2024 · 1 comment

Comments

@sangallidavide
Copy link
Member

Error message is

 <02s> P4: [ERROR] Allocation of X_par%blc_d failed with code 1
P4: [ERROR] STOP signal received while in[05] Dynamic Dielectric Matrix (PPA)
P4: [ERROR] Not enough memory to allocate 0 bytes
@mikeatm
Copy link

mikeatm commented Aug 3, 2024

I have met a similar problem when running BSE calcs with nvsdk 24.3 and cuda 12.3, when compiled with the slightly older (24.3) nvfortran, the 0 sized memory error would come from a failure of this kernel below, producing -1,-1,-1 for ln.
src/wf_and_fft/fft_setup.F:99

But seems resolved on 24.5 and cuda 12.4, instead this new error happens on bug-fixes (f859a7f) and maintenance-master (3d7b25d)

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x1554c5287540

Thread 1 "yambo" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 3080883, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
0x00001554c52875f0 in x_redux_x_redux_build_kernel_367_gpu
   <<<(2,11,1),(32,4,1)>>> ()

this i expect comes from this line

0x00001554c52a04b0 in x_redux_x_redux_build_kernel_367_gpu
   <<<(2,11,1),(32,4,1)>>> (iq=9)
    at /home/max/applications/yambo-5.2.1/src/pol_function/X_redux.F:368

This seems related to #120 , and roughly to #76.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants