Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT/generic : Replace explicit openacc directives with do concurrent #30

Open
mathrack opened this issue Mar 11, 2022 · 3 comments
Open

Comments

@mathrack
Copy link
Collaborator

Following commit ef6dfb9, explicit openacc directive allow the generic fft to run on GPU. Do concurrent should be used instead of explicit openacc directives.

@mathrack
Copy link
Collaborator Author

However, tests performed on the cluster Bede during the last GPU hackathon suggest that the nvidia compiler is not currently able to deal with this kind of do concurrent. Attached is a small example showing the issue.
example.tar.gz
The output on Bede is :

 Compiled with nvfortran 22.1-0
 Compiler options : 
 main.f90 -Mfree -Kieee -Minfo=accel,ftn,inline,loop,vect,opt,stdpar -stdpar=gpu -gpu=cc70,managed,lineinfo,deepcopy -acc -target=gpu -traceback -O3 -Mvect=simd -Mflushz -Mcache_align -Mrecip-div -Mfactorize -Minstrument -g -c -I/opt/software/builder/developers/compilers/nvhpc/22.1/1/default/Linux_ppc64le/22.1/comm_libs/openmpi/openmpi-3.1.5/include -I/opt/software/builder/developers/compilers/nvhpc/22.1/1/default/Linux_ppc64le/22.1/comm_libs/openmpi/openmpi-3.1.5/lib
 Absolute error for derxacc :    6.9501907648138697E-002
 Absolute error for derxDC :     985420388520.4409

@fspiga
Copy link

fspiga commented Mar 11, 2022

!$acc routine(...) seq cannot be simply removed.

The example attached refers to der* routines, FFT/2decomp, the very generic adhoc FFT implementation ... what part exactly?

@mathrack
Copy link
Collaborator Author

The objective would be to keep the !$acc routine() seq and replace

!$acc parallel loop gang vector collapse(2) private(buffer)
    do k = 1, nz
    do j = 1, ny
    ...
    enddo
    enddo

with

    do concurrent (k=1:nz, j=1:ny) local(buffer)
    ...
    enddo

or alternatively a 3D shared buffer and

    do concurrent (k=1:nz, j=1:ny)
    ...
    enddo

The example refers to der* subroutines but fixing the example should also fix the current fft/generic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants