add elpa native to solver generate eigen #4969

goodchong · 2024-08-16T11:23:33Z

What's changed?

Added the native interface call logic for the ELPA generalized eigenvalue solver function.
support double and complex, support gamma only and multi k
support gpu and cpu, and multi gpu
support kpar

usage:

edit INPUT file to:

ks_solver elpa
device cpu  // gpu, if you have gpu

compile the code:

compile the elpa with gpu
./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_89 --enable-openmp --disable-sse --disable-avx --disable-avx2  --disable-avx512 --with-cuda-path=/usr/local/cuda/

make -j32

compile the abacus with elpa

cmake -B build_gpu_elpa -DELPA_LINK_LIBRARIES=/home/goodchong/elpa-2024.05.001/.libs/libelpa_openmp.so  -DELPA_INCLUDE_DIR=/home/goodchong/elpa-2024.05.001/ -DUSE_CUDA=ON

cmake --build build_gpu_elpa -j32

known issue

Unless there is a specific reason, avoid using multiple GPUs, as it can be slower than using a single GPU.
When using GPUs, some internal ELPA logs will be output, and so far, there is no way to disable them.
some gpu log looks like:

 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(0.379743   SEC) : INIT PLANEWAVE
 -------------------------------------------
 SELF-CONSISTENT :
 -------------------------------------------
 START CHARGE      : atomic
 DONE(11.655     SEC) : INIT SCF
 * * * * * *
 << Start SCF iteration.
 Initializing the GPU devices
Found 2 GPUs
MPI rank 11 uses GPU #1
MPI rank 3 uses GPU #1
MPI rank 17 uses GPU #1
MPI rank 16 uses GPU #0
MPI rank 4 uses GPU #0
MPI rank 8 uses GPU #0
MPI rank 2 uses GPU #0
MPI rank 22 uses GPU #0
MPI rank 19 uses GPU #1
MPI rank 9 uses GPU #1
MPI rank 23 uses GPU #1
MPI rank 7 uses GPU #1
MPI rank 5 uses GPU #1
MPI rank 20 uses GPU #0
MPI rank 13 uses GPU #1
MPI rank 0 uses GPU #0
MPI rank 21 uses GPU #1
MPI rank 1 uses GPU #1
MPI rank 15 uses GPU #1
MPI rank 14 uses GPU #0
MPI rank 6 uses GPU #0
MPI rank 12 uses GPU #0
MPI rank 10 uses GPU #0
MPI rank 18 uses GPU #0
 CUBLAS version:       120600
 NVIDIA maxThreadsPerBlock:         1024
 NVIDIA MaxBLockDimX:         1024
 NVIDIA MaxBLockDimY:         1024
 NVIDIA MaxBLockDimZ:           64
 NVIDIA MaxGridDimX:   2147483647
 NVIDIA MaxGridDimY:        65535
 NVIDIA MaxGridDimZ:        65535
 NVIDIA SM count:          128
 To use Cannons algorithm, np_cols must be a multiple of np_rows.
 Switching to elpa Hermitian and scalapack
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 EL1     -5.64670929e+04   0.00000000e+00   1.7030e-01  37.46
MPI rank 1 uses GPU #1
MPI rank 17 uses GPU #1
MPI rank 3 uses GPU #1
MPI rank 10 uses GPU #0
MPI rank 22 uses GPU #0
MPI rank 5 uses GPU #1
MPI rank 2 uses GPU #0
MPI rank 9 uses GPU #1
Found 2 GPUs
MPI rank 0 uses GPU #0
MPI rank 15 uses GPU #1
 Initializing the GPU devices
MPI rank 18 uses GPU #0
MPI rank 21 uses GPU #1
MPI rank 8 uses GPU #0
MPI rank 11 uses GPU #1
MPI rank 16 uses GPU #0
MPI rank 7 uses GPU #1
MPI rank 14 uses GPU #0
MPI rank 23 uses GPU #1
MPI rank 4 uses GPU #0
MPI rank 19 uses GPU #1
MPI rank 12 uses GPU #0
MPI rank 13 uses GPU #1
MPI rank 6 uses GPU #0
MPI rank 20 uses GPU #0
 CUBLAS version:       120600
 NVIDIA maxThreadsPerBlock:         1024
 NVIDIA MaxBLockDimX:         1024
 NVIDIA MaxBLockDimY:         1024
 NVIDIA MaxBLockDimZ:           64
 NVIDIA MaxGridDimX:   2147483647
 NVIDIA MaxGridDimY:        65535
 NVIDIA MaxGridDimZ:        65535
 NVIDIA SM count:          128
 EL2     -5.64065892e+04   6.05037024e+01   9.2489e-02  34.79
MPI rank 2 uses GPU #0
Found 2 GPUs
MPI rank 0 uses GPU #0
MPI rank 4 uses GPU #0
MPI rank 19 uses GPU #1
MPI rank 12 uses GPU #0
MPI rank 13 uses GPU #1
MPI rank 6 uses GPU #0
MPI rank 9 uses GPU #1
MPI rank 1 uses GPU #1
MPI rank 17 uses GPU #1
MPI rank 3 uses GPU #1
MPI rank 10 uses GPU #0
MPI rank 22 uses GPU #0
MPI rank 5 uses GPU #1
MPI rank 20 uses GPU #0
MPI rank 15 uses GPU #1
MPI rank 18 uses GPU #0
MPI rank 21 uses GPU #1
MPI rank 8 uses GPU #0
MPI rank 11 uses GPU #1
MPI rank 16 uses GPU #0
MPI rank 7 uses GPU #1
MPI rank 14 uses GPU #0
MPI rank 23 uses GPU #1
 Initializing the GPU devices
 CUBLAS version:       120600
 NVIDIA maxThreadsPerBlock:         1024
 NVIDIA MaxBLockDimX:         1024
 NVIDIA MaxBLockDimY:         1024
 NVIDIA MaxBLockDimZ:           64
 NVIDIA MaxGridDimX:   2147483647
 NVIDIA MaxGridDimY:        65535
 NVIDIA MaxGridDimZ:        65535
 NVIDIA SM count:          128
 EL3     -5.64210941e+04  -1.45048101e+01   1.6619e-02  35.27
 >> Leave SCF iteration.

Any changes of core modules? (ignore if not applicable)

add elpa native to HSolver

source/module_hsolver/diago_elpa_native.cpp

caic99 · 2024-08-26T07:10:55Z

Please update docs to parameter ks_solver elpa and also its GPU support. It is recommended to add a test for elpa using GPU (maybe in this PR or a new one).

haozhihan · 2024-08-28T02:39:27Z

This part of docs (docs/advanced/acceleration/cuda.md) also needs to be updated. Please update it. Thank you!

goodchong · 2024-08-28T06:48:31Z

This part of docs (docs/advanced/acceleration/cuda.md) also needs to be updated. Please update it. Thank you!

Made some updates, please review.

haozhihan · 2024-08-28T07:09:02Z

https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/input-main.md#ks_solver

In the input-main section of the docs, there is already a genelpa method in kssolver. What is the difference between elpa method and genelpa method? Do we need to update the document to let users know? @goodchong @caic99 @mohanchen

goodchong · 2024-08-28T07:47:03Z

https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/input-main.md#ks_solver

In the input-main section of the docs, there is already a genelpa method in kssolver. What is the difference between elpa method and genelpa method? Do we need to update the document to let users know? @goodchong @caic99 @mohanchen

genelpa is a generalized eigenvalue solver based on elpa developed by Shen Yu and Xiaohui. It is the solver that Abacus has always used. The newly added elpa directly calls elpa's native interface.
My suggestion is to set the default value to genelpa temporarily, and then set elpa to default value after elpa has been fully tested and used in the future.

goodchong · 2024-08-28T07:53:42Z

https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/input-main.md#ks_solver

In the input-main section of the docs, there is already a genelpa method in kssolver. What is the difference between elpa method and genelpa method? Do we need to update the document to let users know? @goodchong @caic99 @mohanchen

Well, why did we develop our own distributed generalized eigenvalue solver? I think one of the main reason is that when genelpa was developed, elpa did not support generalized eigenvalue solving yet. The rest story you can ask these 2 guys when next mianji offline.

mohanchen · 2024-09-04T01:21:19Z

It seems new features are included. I will accept the PR. More discussions are welcome.

Critsium-xy · 2024-09-04T06:50:25Z

Maybe the information that elpa should be installed with gpu support should also be updated to the "Easy Installation" part of the document, and place it in where ELPA package is first metioned, or place it in the "USE_ELPA" parameter explanation.

In "Advanced Installation Options", "Build with CUDA support" may also need to be updated with this info. The document now is a bit difficult to locate this infomation and may cause confusion for users when they find that they fail to build the CUDA version (because of this elpa setting).

Critsium-xy · 2024-09-04T07:25:28Z

Or maybe adding an "USE_GPU_ELPA" option is better?

caic99 · 2024-09-19T09:56:41Z

source/module_hsolver/diago_elpa_native.cpp

+    elpa_setup(handle);
+    elpa_set(handle, "solver", ELPA_SOLVER_1STAGE, &success);
+
+#ifdef __CUDA


Maybe it is possible to use ELPA_WITH_NVIDIA_GPU_VERSION from elpa/elpa_configured_options.h to determine if elpa is installed with GPU support.

tang070205 · 2025-01-12T14:39:28Z

@goodchong Hello, I'm compiling elpa the way you did, the gpu is nvidia 30 series, but I'm getting this error on make -j
/usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::~Init()' /usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::Init()'

goodchong · 2025-01-20T05:23:40Z

@goodchong Hello, I'm compiling elpa the way you did, the gpu is nvidia 30 series, but I'm getting this error on make -j /usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::~Init()' /usr/bin/ld: ./.libs/libelpa_openmp.so: undefined reference to std::ios_base::Init::Init()'

dear user, maybe you should add a SCALAPACK_LDFLAGS="-lstdc++" to your configure parameters.

for example :
FC=mpiifort CC=mpiicc CXX=mpiicpc ./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_80 --enable-openmp --prefix=/home/shenyugroup/solomonz1/good_test/elpa-2024.05.001 SCALAPACK_LDFLAGS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -lstdc++ -Wl,-rpath,$MKLROOT/lib/intel64" SCALAPACK_FCFLAGS="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -I$MKLROOT/include/intel64/lp64"

tang070205 · 2025-01-20T12:07:03Z

Thank you for your reply, after I added SCALAPACK_LDFLAGS=“-lstdc++” the problem does not appear, but the opempi related problem appears (below), and I would like to ask one more thing, ./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_89 --enable-openmp --disable-sse --disable-avx --disable-avx2 -- disable-avx512 --with-cuda-path=/usr/local/cuda/ Is this supplied using oneapi or is it supplied as a direct apt install of openblas, scalapack, openmpi?

goodchong · 2025-01-22T08:56:53Z

./configure --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_89 --enable-openmp --disable-sse --disable-avx --disable-avx2 -- disable-avx512 --with-cuda-path=/usr/local/cuda/

i think this is apt install libs.

If you are using oneAPI to compile, you can check the official documentation of ELPA. The documentation provides a set of very complex configuration options. Alternatively, you can refer to the example in one of my previous replies.

goodchong added 3 commits August 16, 2024 10:27

add elpa native interface to solve generate eigen

da881c7

add device flag

4ab3e06

add elpa gpu support in cuda docker

9e1a5c2

mohanchen reviewed Aug 17, 2024

View reviewed changes

source/module_hsolver/diago_elpa_native.cpp Show resolved Hide resolved

goodchong and others added 6 commits August 21, 2024 06:03

add some elpa parameter init

4f90436

Update operator_lcao.cpp add header file of elpa native

74fb99b

Update Makefile.Objects to add elpa native objs

4954d22

fix elpa version

7e50a27

Merge branch 'develop' into elpa_gpu

cbc4e5b

Merge branch 'develop' into elpa_gpu

0253fee

mohanchen approved these changes Aug 26, 2024

View reviewed changes

mohanchen requested a review from haozhihan August 26, 2024 05:22

Merge branch 'develop' into elpa_gpu

09e661a

refined cuda doc with elpa gpu

e3ab9ea

haozhihan approved these changes Aug 28, 2024

View reviewed changes

goodchong self-assigned this Sep 3, 2024

goodchong added the GPU & DCU & HPC GPU and DCU and HPC related any issues label Sep 3, 2024

Merge branch 'develop' into elpa_gpu

0ac7fab

mohanchen merged commit 731388d into deepmodeling:develop Sep 4, 2024
14 checks passed

Critsium-xy mentioned this pull request Sep 4, 2024

Elpa native solver document needs to be updated #5048

Closed

9 tasks

caic99 reviewed Sep 19, 2024

View reviewed changes

goodchong deleted the elpa_gpu branch December 4, 2024 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add elpa native to solver generate eigen #4969

add elpa native to solver generate eigen #4969

goodchong commented Aug 16, 2024

caic99 commented Aug 26, 2024

haozhihan commented Aug 28, 2024 •

edited

Loading

goodchong commented Aug 28, 2024

haozhihan commented Aug 28, 2024

goodchong commented Aug 28, 2024

goodchong commented Aug 28, 2024

mohanchen commented Sep 4, 2024

Critsium-xy commented Sep 4, 2024

Critsium-xy commented Sep 4, 2024

caic99 Sep 19, 2024

tang070205 commented Jan 12, 2025

goodchong commented Jan 20, 2025

tang070205 commented Jan 20, 2025

goodchong commented Jan 22, 2025

add elpa native to solver generate eigen #4969

add elpa native to solver generate eigen #4969

Conversation

goodchong commented Aug 16, 2024

What's changed?

usage:

compile the code:

known issue

Any changes of core modules? (ignore if not applicable)

caic99 commented Aug 26, 2024

haozhihan commented Aug 28, 2024 • edited Loading

goodchong commented Aug 28, 2024

haozhihan commented Aug 28, 2024

goodchong commented Aug 28, 2024

goodchong commented Aug 28, 2024

mohanchen commented Sep 4, 2024

Critsium-xy commented Sep 4, 2024

Critsium-xy commented Sep 4, 2024

caic99 Sep 19, 2024

Choose a reason for hiding this comment

tang070205 commented Jan 12, 2025

goodchong commented Jan 20, 2025

tang070205 commented Jan 20, 2025

goodchong commented Jan 22, 2025

haozhihan commented Aug 28, 2024 •

edited

Loading