Speed-up hot path of GFN2 geometry optimization #1178

foxtran · 2025-02-05T16:48:45Z

This PR improves parallelization for GFN2 geometry optimization. It affects other types of calculations too.

Achieved speed-up is 30%. Gradients are 2x faster!

For 578 atom structure, before the patch:

 total:
 * wall-time:     0 d,  0 h,  1 min, 11.680 sec
 *  cpu-time:     0 d,  0 h,  3 min,  7.524 sec
 * ratio c/w:     2.616 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min, 13.534 sec
 *  cpu-time:     0 d,  0 h,  0 min, 41.212 sec
 * ratio c/w:     3.045 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 57.390 sec
 *  cpu-time:     0 d,  0 h,  2 min, 25.103 sec
 * ratio c/w:     2.528 speedup

normal termination of xtb

real 1m11.740s
user 3m5.985s
sys 0m1.572s

After the patch:

 total:
 * wall-time:     0 d,  0 h,  0 min, 48.779 sec
 *  cpu-time:     0 d,  0 h,  2 min, 51.122 sec
 * ratio c/w:     3.508 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min, 11.737 sec
 *  cpu-time:     0 d,  0 h,  0 min, 42.249 sec
 * ratio c/w:     3.600 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 36.377 sec
 *  cpu-time:     0 d,  0 h,  2 min,  8.025 sec
 * ratio c/w:     3.519 speedup

normal termination of xtb

real 0m48.971s
user 2m50.138s
sys 0m1.012s

Command for testing:

xtb --input model.xtb model.xyz -P 4 --gfn 2 --opt vtight --alpb water --cycles 5

model.xtb (nothing interesing, to be honest):

$opt
    engine=rf
$end
$alpb
    kernel=still
    grid=tight
$end

Build type: CMake, Release, gfortran-14, Intel MKL, no march (can get extra speed-up), enabled debug info.

foxtran · 2025-02-05T17:03:33Z

@Albkat, @awvwgk, that is not funny that about 1/20 of all CI runs are failing :(

foxtran · 2025-02-05T20:03:43Z

I have no idea why it does not work with ifort. Try to debug tomorrow

foxtran · 2025-02-05T20:06:06Z

@Albkat, do not try it to restart: it will still fail. Most probably it is a bug in xtb code with undefined variables. I have a tool for debugging compiler :-)

Albkat · 2025-02-05T20:08:53Z

Thanks, I'll test this.

@Albkat, @awvwgk, that is not funny that about 1/20 of all CI runs are failing :(

Regarding CI, there are some stochastic failures with PBC, GFN-FF, and CPCM-X that we are trying to identify. It is still a work in progress.

@Albkat, do not try it to restart: it will still fail. Most probably it is a bug in xtb code with undefined variables. I have a tool for debugging compiler :-)

Yeah, just double-checking if it is compiler-specific :)

foxtran · 2025-02-05T20:10:43Z

Regarding CI, there are some stochastic failures with PBC, GFN-FF, and CPCM-X that we are trying to identify. It is still a work in progress.

They are reproducible. I did it for couple of them. :-)

Albkat · 2025-02-05T20:18:12Z

Regarding CI, there are some stochastic failures with PBC, GFN-FF, and CPCM-X that we are trying to identify. It is still a work in progress.

They are reproducible. I did it for couple of them. :-)

I can reproduce them about 30% of the time when running sequentially.
Did you use anything special to freeze the executable state?

foxtran · 2025-02-05T20:41:39Z

@Albkat, see #1182. I just do it for you right now :)

foxtran · 2025-02-06T09:26:35Z

@Albkat, it works now for legacy Intel Compilers :-)

foxtran · 2025-02-10T21:34:46Z

@Albkat, can we merge it? :-)

Albkat · 2025-02-10T21:46:24Z

Seems legit to me, I'll test this tomorrow and then merge:)
It would be beneficial to understand what you did there, maybe we could use this in other places too...

foxtran · 2025-02-10T22:16:20Z

It would be beneficial to understand what you did there, maybe we could use this in other places too...

I just started to use collapse(2) OpenMP statement :-)

Note, most of Fortran compilers does not support triangle cycles therefore one needs to give them rectangular grids and one need to be careful about which elements are skipped for better performance and which scheduler should be used.

Signed-off-by: Igor S. Gerasimov <[email protected]>

foxtran · 2025-02-11T09:23:06Z

@Albkat, rebased on current master branch

marvinfriede · 2025-02-11T14:24:04Z

I just started to use collapse(2) OpenMP statement :-)

Note, most of Fortran compilers does not support triangle cycles therefore one needs to give them rectangular grids and one need to be careful about which elements are skipped for better performance and which scheduler should be used.

It seems the rules on collapse are rather strict (see here).

I did some testing with the D4 (standalone) code and did not find any improvements: triangular iteration space with only outer loop parallel and rectangular iteration space with collapse(2) yielded the same timings. Just wanted to say that this requires testing and does not always help (but nice find here in xtb) :)

foxtran · 2025-02-11T14:35:51Z

triangular iteration space with only outer loop parallel and rectangular iteration space with collapse(2) yielded the same timings.

How many atoms does test have?

marvinfriede · 2025-02-11T14:37:38Z

triangular iteration space with only outer loop parallel and rectangular iteration space with collapse(2) yielded the same timings.

How many atoms does test have?

I think I tested 500 and 1000 atoms, maybe also 2000.

foxtran · 2025-02-11T14:41:37Z

Hmm... That is interesting. I'm using gfortran for testing.

marvinfriede · 2025-02-11T14:44:30Z

Hmm... That is interesting. I'm using gfortran for testing.

Same. I will take a look at my test setup again later. Maybe I missed something.

toxtran · 2025-02-11T15:00:38Z

BTW, do you check cpu time or real time? CPU time should not be changing significantly

marvinfriede · 2025-02-11T15:03:57Z

BTW, do you check cpu time or real time? CPU time should not be changing significantly

I checked real time.

foxtran · 2025-02-11T16:55:42Z

Ah... I found why it does not look so for your systems: the system which I'm using for testing has a extremely fast SCF and slow gradient part, while another system, which I generated, spends 2 min at SCF and only 10 seconds for gradients.

foxtran · 2025-02-11T17:03:27Z

My system with 528 atoms, build from c8c187c:

Cycle 1:
     SCC iter.                  ...        0 min,  1.490 sec
     gradient                   ...        0 min,  3.551 sec
Cycle 2:
     SCC iter.                  ...        0 min,  3.293 sec
     gradient                   ...        0 min,  4.337 sec
     
 total:
 * wall-time:     0 d,  0 h,  0 min, 55.900 sec
 *  cpu-time:     0 d,  0 h,  4 min, 36.190 sec
 * ratio c/w:     4.941 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min, 12.917 sec
 *  cpu-time:     0 d,  0 h,  1 min,  9.732 sec
 * ratio c/w:     5.399 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 42.595 sec
 *  cpu-time:     0 d,  0 h,  3 min, 25.240 sec
 * ratio c/w:     4.818 speedup

Build from 9c78931:

Cycle 1:
     SCC iter.                  ...        0 min,  1.673 sec
     gradient                   ...        0 min,  1.913 sec
Cycle 2:
     SCC iter.                  ...        0 min,  3.203 sec
     gradient                   ...        0 min,  1.524 sec

 total:
 * wall-time:     0 d,  0 h,  0 min, 40.080 sec
 *  cpu-time:     0 d,  0 h,  4 min, 23.575 sec
 * ratio c/w:     6.576 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  9.735 sec
 *  cpu-time:     0 d,  0 h,  1 min,  5.865 sec
 * ratio c/w:     6.766 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 29.952 sec
 *  cpu-time:     0 d,  0 h,  3 min, 16.568 sec
 * ratio c/w:     6.563 speedup

foxtran · 2025-02-11T17:13:45Z

I have also tried to make a script for reproducing my numbers :-)

git clone https://github.com/grimme-lab/xtb.git xtb.opt
cd xtb.opt
cmake -Bbuild_main -DCMAKE_BUILD_TYPE=Release
# -- The C compiler identification is GNU 14.1.0
# -- The Fortran compiler identification is GNU 14.1.0
# -- Cray Programming Environment 2.7.23 C
# -- Found OpenMP_Fortran: -fopenmp (found version "4.5")
# -- Found BLAS: implicitly linked
make -C build_main -j 40
git remote add foxtran https://github.com/foxtran/xtb.git
git fetch foxtran
git checkout foxtran/feature/speedup-gfn2
cmake -Bbuild_opt -D CMAKE_BUILD_TYPE=Release
make -C build_opt -j 40
mkdir -p build_main/check/
mkdir -p build_opt/check/
cat << EOF > gen.py
#!/usr/bin/env python

N = 20
Nat = N * N

out = [f"{Nat}", "Fluorine plane"]
for i in range(0, N):
  for j in range(0, N):
    out.append(f"F {i}.0 {j}.0 0.0")

open("build_main/check/F-plane.xyz", "w").write("\n".join(out))
open("build_opt/check/F-plane.xyz", "w").write("\n".join(out))
EOF
chmod +x gen.py
./gen.py
cd build_main/check/
OMP_NUM_THREADS=8 ../xtb F-plane.xyz -P 8 --gfn 2 --opt --alpb water --cycles 5
#  SCC (total)                   0 d,  0 h,  0 min, 21.385 sec
# .............................. CYCLE    1 ..............................
#      SCC iter.                  ...        0 min,  2.515 sec
#      gradient                   ...        0 min,  1.880 sec
# .............................. CYCLE    2 ..............................
#      SCC iter.                  ...        0 min, 17.930 sec
#      gradient                   ...        0 min,  2.432 sec
# .............................. CYCLE    3 ..............................
#      SCC iter.                  ...        0 min, 14.734 sec
#      gradient                   ...        0 min,  2.539 sec
# 
#  total:
#  * wall-time:     0 d,  0 h,  1 min, 44.294 sec
#  *  cpu-time:     0 d,  0 h, 10 min, 30.204 sec
#  * ratio c/w:     6.043 speedup
#  SCF:
#  * wall-time:     0 d,  0 h,  0 min, 21.394 sec
#  *  cpu-time:     0 d,  0 h,  2 min, 17.542 sec
#  * ratio c/w:     6.429 speedup
#  ANC optimizer:
#  * wall-time:     0 d,  0 h,  1 min, 22.522 sec
#  *  cpu-time:     0 d,  0 h,  8 min, 11.316 sec
#  * ratio c/w:     5.954 speedup
cd ../../
cd build_opt/check/
OMP_NUM_THREADS=8 ../xtb F-plane.xyz -P 8 --gfn 2 --opt --alpb water --cycles 5
#  SCC (total)                   0 d,  0 h,  0 min, 17.003 sec
# .............................. CYCLE    1 ..............................
#      SCC iter.                  ...        0 min,  2.285 sec
#      gradient                   ...        0 min,  1.033 sec
# .............................. CYCLE    2 ..............................
#      SCC iter.                  ...        0 min, 14.642 sec
#      gradient                   ...        0 min,  1.102 sec
# .............................. CYCLE    3 ..............................
#      SCC iter.                  ...        0 min, 11.684 sec
#      gradient                   ...        0 min,  1.213 sec
# 
#  total:
#  * wall-time:     0 d,  0 h,  1 min, 21.159 sec
#  *  cpu-time:     0 d,  0 h,  8 min, 56.798 sec
#  * ratio c/w:     6.614 speedup
#  SCF:
#  * wall-time:     0 d,  0 h,  0 min, 17.014 sec
#  *  cpu-time:     0 d,  0 h,  1 min, 58.051 sec
#  * ratio c/w:     6.939 speedup
#  ANC optimizer:
#  * wall-time:     0 d,  0 h,  1 min,  3.859 sec
#  *  cpu-time:     0 d,  0 h,  6 min, 57.683 sec
#  * ratio c/w:     6.541 speedup

As you can notice, the parallelization became a little bit better. However, if SCF sucks, this patch does not provide significant improvements :(

Albkat self-requested a review February 5, 2025 19:59

Albkat added the driver: optimization Related to the geometry optimization driver label Feb 5, 2025

Albkat added this to the v6.7.2 milestone Feb 5, 2025

foxtran force-pushed the feature/speedup-gfn2 branch from 0401508 to d9a5bd9 Compare February 6, 2025 09:18

foxtran force-pushed the feature/speedup-gfn2 branch from d9a5bd9 to 1fbb43d Compare February 10, 2025 21:25

foxtran added 6 commits February 11, 2025 10:22

Speed-up Hamiltonian with collapse statement

c7cc1d5

Signed-off-by: Igor S. Gerasimov <[email protected]>

Speed-up Dispersion with collapse statement

92ca600

Signed-off-by: Igor S. Gerasimov <[email protected]>

Speed-up Coulomb electrostatic with collapse statement

8ad581a

Signed-off-by: Igor S. Gerasimov <[email protected]>

Fix condition and move ri assignment

ab4aa80

Signed-off-by: Igor S. Gerasimov <[email protected]>

workaround for legacy Intel Fortran compilers

19d1aa9

Signed-off-by: Igor S. Gerasimov <[email protected]>

Better parallelization for other hamiltonians

9c78931

Signed-off-by: Igor S. Gerasimov <[email protected]>

foxtran force-pushed the feature/speedup-gfn2 branch from 1fbb43d to 9c78931 Compare February 11, 2025 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed-up hot path of GFN2 geometry optimization #1178

Speed-up hot path of GFN2 geometry optimization #1178

foxtran commented Feb 5, 2025 •

edited

Loading

foxtran commented Feb 5, 2025

foxtran commented Feb 5, 2025

foxtran commented Feb 5, 2025

Albkat commented Feb 5, 2025

foxtran commented Feb 5, 2025

Albkat commented Feb 5, 2025

foxtran commented Feb 5, 2025 •

edited

Loading

foxtran commented Feb 6, 2025

foxtran commented Feb 10, 2025

Albkat commented Feb 10, 2025 •

edited

Loading

foxtran commented Feb 10, 2025

foxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

foxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

foxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

toxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

foxtran commented Feb 11, 2025

foxtran commented Feb 11, 2025

foxtran commented Feb 11, 2025 •

edited

Loading

Speed-up hot path of GFN2 geometry optimization #1178

Are you sure you want to change the base?

Speed-up hot path of GFN2 geometry optimization #1178

Conversation

foxtran commented Feb 5, 2025 • edited Loading

foxtran commented Feb 5, 2025

foxtran commented Feb 5, 2025

foxtran commented Feb 5, 2025

Albkat commented Feb 5, 2025

foxtran commented Feb 5, 2025

Albkat commented Feb 5, 2025

foxtran commented Feb 5, 2025 • edited Loading

foxtran commented Feb 6, 2025

foxtran commented Feb 10, 2025

Albkat commented Feb 10, 2025 • edited Loading

foxtran commented Feb 10, 2025

foxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

foxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

foxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

toxtran commented Feb 11, 2025

marvinfriede commented Feb 11, 2025

foxtran commented Feb 11, 2025

foxtran commented Feb 11, 2025

foxtran commented Feb 11, 2025 • edited Loading

foxtran commented Feb 5, 2025 •

edited

Loading

foxtran commented Feb 5, 2025 •

edited

Loading

Albkat commented Feb 10, 2025 •

edited

Loading

foxtran commented Feb 11, 2025 •

edited

Loading