Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add elpa native to solver generate eigen #4969

Merged
merged 12 commits into from
Sep 4, 2024

Conversation

goodchong
Copy link
Collaborator

What's changed?

  • Added the native interface call logic for the ELPA generalized eigenvalue solver function.
  • support double and complex, support gamma only and multi k
  • support gpu and cpu, and multi gpu
  • support kpar

usage:

edit INPUT file to:

ks_solver elpa
device cpu  // gpu, if you have gpu

compile the code:

compile the elpa with gpu
./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_89 --enable-openmp --disable-sse --disable-avx --disable-avx2  --disable-avx512 --with-cuda-path=/usr/local/cuda/

make -j32

compile the abacus with elpa

cmake -B build_gpu_elpa -DELPA_LINK_LIBRARIES=/home/goodchong/elpa-2024.05.001/.libs/libelpa_openmp.so  -DELPA_INCLUDE_DIR=/home/goodchong/elpa-2024.05.001/ -DUSE_CUDA=ON

cmake --build build_gpu_elpa -j32

known issue

  1. Unless there is a specific reason, avoid using multiple GPUs, as it can be slower than using a single GPU.
  2. When using GPUs, some internal ELPA logs will be output, and so far, there is no way to disable them.
    some gpu log looks like:
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(0.379743   SEC) : INIT PLANEWAVE
 -------------------------------------------
 SELF-CONSISTENT :
 -------------------------------------------
 START CHARGE      : atomic
 DONE(11.655     SEC) : INIT SCF
 * * * * * *
 << Start SCF iteration.
 Initializing the GPU devices
Found 2 GPUs
MPI rank 11 uses GPU #1
MPI rank 3 uses GPU #1
MPI rank 17 uses GPU #1
MPI rank 16 uses GPU #0
MPI rank 4 uses GPU #0
MPI rank 8 uses GPU #0
MPI rank 2 uses GPU #0
MPI rank 22 uses GPU #0
MPI rank 19 uses GPU #1
MPI rank 9 uses GPU #1
MPI rank 23 uses GPU #1
MPI rank 7 uses GPU #1
MPI rank 5 uses GPU #1
MPI rank 20 uses GPU #0
MPI rank 13 uses GPU #1
MPI rank 0 uses GPU #0
MPI rank 21 uses GPU #1
MPI rank 1 uses GPU #1
MPI rank 15 uses GPU #1
MPI rank 14 uses GPU #0
MPI rank 6 uses GPU #0
MPI rank 12 uses GPU #0
MPI rank 10 uses GPU #0
MPI rank 18 uses GPU #0
 CUBLAS version:       120600
 NVIDIA maxThreadsPerBlock:         1024
 NVIDIA MaxBLockDimX:         1024
 NVIDIA MaxBLockDimY:         1024
 NVIDIA MaxBLockDimZ:           64
 NVIDIA MaxGridDimX:   2147483647
 NVIDIA MaxGridDimY:        65535
 NVIDIA MaxGridDimZ:        65535
 NVIDIA SM count:          128
 To use Cannons algorithm, np_cols must be a multiple of np_rows.
 Switching to elpa Hermitian and scalapack
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 EL1     -5.64670929e+04   0.00000000e+00   1.7030e-01  37.46
MPI rank 1 uses GPU #1
MPI rank 17 uses GPU #1
MPI rank 3 uses GPU #1
MPI rank 10 uses GPU #0
MPI rank 22 uses GPU #0
MPI rank 5 uses GPU #1
MPI rank 2 uses GPU #0
MPI rank 9 uses GPU #1
Found 2 GPUs
MPI rank 0 uses GPU #0
MPI rank 15 uses GPU #1
 Initializing the GPU devices
MPI rank 18 uses GPU #0
MPI rank 21 uses GPU #1
MPI rank 8 uses GPU #0
MPI rank 11 uses GPU #1
MPI rank 16 uses GPU #0
MPI rank 7 uses GPU #1
MPI rank 14 uses GPU #0
MPI rank 23 uses GPU #1
MPI rank 4 uses GPU #0
MPI rank 19 uses GPU #1
MPI rank 12 uses GPU #0
MPI rank 13 uses GPU #1
MPI rank 6 uses GPU #0
MPI rank 20 uses GPU #0
 CUBLAS version:       120600
 NVIDIA maxThreadsPerBlock:         1024
 NVIDIA MaxBLockDimX:         1024
 NVIDIA MaxBLockDimY:         1024
 NVIDIA MaxBLockDimZ:           64
 NVIDIA MaxGridDimX:   2147483647
 NVIDIA MaxGridDimY:        65535
 NVIDIA MaxGridDimZ:        65535
 NVIDIA SM count:          128
 EL2     -5.64065892e+04   6.05037024e+01   9.2489e-02  34.79
MPI rank 2 uses GPU #0
Found 2 GPUs
MPI rank 0 uses GPU #0
MPI rank 4 uses GPU #0
MPI rank 19 uses GPU #1
MPI rank 12 uses GPU #0
MPI rank 13 uses GPU #1
MPI rank 6 uses GPU #0
MPI rank 9 uses GPU #1
MPI rank 1 uses GPU #1
MPI rank 17 uses GPU #1
MPI rank 3 uses GPU #1
MPI rank 10 uses GPU #0
MPI rank 22 uses GPU #0
MPI rank 5 uses GPU #1
MPI rank 20 uses GPU #0
MPI rank 15 uses GPU #1
MPI rank 18 uses GPU #0
MPI rank 21 uses GPU #1
MPI rank 8 uses GPU #0
MPI rank 11 uses GPU #1
MPI rank 16 uses GPU #0
MPI rank 7 uses GPU #1
MPI rank 14 uses GPU #0
MPI rank 23 uses GPU #1
 Initializing the GPU devices
 CUBLAS version:       120600
 NVIDIA maxThreadsPerBlock:         1024
 NVIDIA MaxBLockDimX:         1024
 NVIDIA MaxBLockDimY:         1024
 NVIDIA MaxBLockDimZ:           64
 NVIDIA MaxGridDimX:   2147483647
 NVIDIA MaxGridDimY:        65535
 NVIDIA MaxGridDimZ:        65535
 NVIDIA SM count:          128
 EL3     -5.64210941e+04  -1.45048101e+01   1.6619e-02  35.27
 >> Leave SCF iteration.

Any changes of core modules? (ignore if not applicable)

  • add elpa native to HSolver

@mohanchen mohanchen requested a review from haozhihan August 26, 2024 05:22
@caic99
Copy link
Member

caic99 commented Aug 26, 2024

Please update docs to parameter ks_solver elpa and also its GPU support. It is recommended to add a test for elpa using GPU (maybe in this PR or a new one).

@haozhihan
Copy link
Collaborator

haozhihan commented Aug 28, 2024

This part of docs (docs/advanced/acceleration/cuda.md) also needs to be updated. Please update it. Thank you!

@goodchong
Copy link
Collaborator Author

This part of docs (docs/advanced/acceleration/cuda.md) also needs to be updated. Please update it. Thank you!

Made some updates, please review.

@haozhihan
Copy link
Collaborator

https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/input-main.md#ks_solver

In the input-main section of the docs, there is already a genelpa method in kssolver. What is the difference between elpa method and genelpa method? Do we need to update the document to let users know? @goodchong @caic99 @mohanchen

@goodchong
Copy link
Collaborator Author

https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/input-main.md#ks_solver

In the input-main section of the docs, there is already a genelpa method in kssolver. What is the difference between elpa method and genelpa method? Do we need to update the document to let users know? @goodchong @caic99 @mohanchen

genelpa is a generalized eigenvalue solver based on elpa developed by Shen Yu and Xiaohui. It is the solver that Abacus has always used. The newly added elpa directly calls elpa's native interface.
My suggestion is to set the default value to genelpa temporarily, and then set elpa to default value after elpa has been fully tested and used in the future.

@goodchong
Copy link
Collaborator Author

https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/input-main.md#ks_solver

In the input-main section of the docs, there is already a genelpa method in kssolver. What is the difference between elpa method and genelpa method? Do we need to update the document to let users know? @goodchong @caic99 @mohanchen

Well, why did we develop our own distributed generalized eigenvalue solver? I think one of the main reason is that when genelpa was developed, elpa did not support generalized eigenvalue solving yet. The rest story you can ask these 2 guys when next mianji offline.

@goodchong goodchong self-assigned this Sep 3, 2024
@goodchong goodchong added the GPU & DCU & HPC GPU and DCU and HPC related any issues label Sep 3, 2024
@mohanchen
Copy link
Collaborator

It seems new features are included. I will accept the PR. More discussions are welcome.

@mohanchen mohanchen merged commit 731388d into deepmodeling:develop Sep 4, 2024
14 checks passed
@Critsium-xy
Copy link
Collaborator

Maybe the information that elpa should be installed with gpu support should also be updated to the "Easy Installation" part of the document, and place it in where ELPA package is first metioned, or place it in the "USE_ELPA" parameter explanation.

In "Advanced Installation Options", "Build with CUDA support" may also need to be updated with this info. The document now is a bit difficult to locate this infomation and may cause confusion for users when they find that they fail to build the CUDA version (because of this elpa setting).

@Critsium-xy
Copy link
Collaborator

Or maybe adding an "USE_GPU_ELPA" option is better?

elpa_setup(handle);
elpa_set(handle, "solver", ELPA_SOLVER_1STAGE, &success);

#ifdef __CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is possible to use ELPA_WITH_NVIDIA_GPU_VERSION from elpa/elpa_configured_options.h to determine if elpa is installed with GPU support.

@goodchong goodchong deleted the elpa_gpu branch December 4, 2024 05:35
@tang070205
Copy link

@goodchong Hello, I'm compiling elpa the way you did, the gpu is nvidia 30 series, but I'm getting this error on make -j
/usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::~Init()' /usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::Init()'

@goodchong
Copy link
Collaborator Author

@goodchong Hello, I'm compiling elpa the way you did, the gpu is nvidia 30 series, but I'm getting this error on make -j /usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::~Init()' /usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::Init()'

dear user, maybe you should add a SCALAPACK_LDFLAGS="-lstdc++" to your configure parameters.

for example :
FC=mpiifort CC=mpiicc CXX=mpiicpc ./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_80 --enable-openmp --prefix=/home/shenyugroup/solomonz1/good_test/elpa-2024.05.001 SCALAPACK_LDFLAGS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -lstdc++ -Wl,-rpath,$MKLROOT/lib/intel64" SCALAPACK_FCFLAGS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -I$MKLROOT/include/intel64/lp64"

@tang070205
Copy link

Thank you for your reply, after I added SCALAPACK_LDFLAGS=“-lstdc++” the problem does not appear, but the opempi related problem appears (below), and I would like to ask one more thing, ./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_89 --enable-openmp --disable-sse --disable-avx --disable-avx2 -- disable-avx512 --with-cuda-path=/usr/local/cuda/ Is this supplied using oneapi or is it supplied as a direct apt install of openblas, scalapack, openmpi?
image

@goodchong
Copy link
Collaborator Author

./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_89 --enable-openmp --disable-sse --disable-avx --disable-avx2 -- disable-avx512 --with-cuda-path=/usr/local/cuda/

i think this is apt install libs.

If you are using oneAPI to compile, you can check the official documentation of ELPA. The documentation provides a set of very complex configuration options. Alternatively, you can refer to the example in one of my previous replies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants