-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up hot path of GFN2 geometry optimization #1178
base: main
Are you sure you want to change the base?
Conversation
I have no idea why it does not work with ifort. Try to debug tomorrow |
@Albkat, do not try it to restart: it will still fail. Most probably it is a bug in xtb code with undefined variables. I have a tool for debugging compiler :-) |
Thanks, I'll test this.
Regarding CI, there are some stochastic failures with PBC, GFN-FF, and CPCM-X that we are trying to identify. It is still a work in progress.
Yeah, just double-checking if it is compiler-specific :) |
They are reproducible. I did it for couple of them. :-) |
I can reproduce them about 30% of the time when running sequentially. |
0401508
to
d9a5bd9
Compare
@Albkat, it works now for legacy Intel Compilers :-) |
d9a5bd9
to
1fbb43d
Compare
@Albkat, can we merge it? :-) |
Seems legit to me, I'll test this tomorrow and then merge:) |
I just started to use Note, most of Fortran compilers does not support triangle cycles therefore one needs to give them rectangular grids and one need to be careful about which elements are skipped for better performance and which scheduler should be used. |
Signed-off-by: Igor S. Gerasimov <[email protected]>
Signed-off-by: Igor S. Gerasimov <[email protected]>
Signed-off-by: Igor S. Gerasimov <[email protected]>
Signed-off-by: Igor S. Gerasimov <[email protected]>
Signed-off-by: Igor S. Gerasimov <[email protected]>
Signed-off-by: Igor S. Gerasimov <[email protected]>
1fbb43d
to
9c78931
Compare
@Albkat, rebased on current master branch |
It seems the rules on collapse are rather strict (see here). I did some testing with the D4 (standalone) code and did not find any improvements: triangular iteration space with only outer loop parallel and rectangular iteration space with collapse(2) yielded the same timings. Just wanted to say that this requires testing and does not always help (but nice find here in xtb) :) |
How many atoms does test have? |
I think I tested 500 and 1000 atoms, maybe also 2000. |
Hmm... That is interesting. I'm using gfortran for testing. |
Same. I will take a look at my test setup again later. Maybe I missed something. |
BTW, do you check cpu time or real time? CPU time should not be changing significantly |
I checked real time. |
Ah... I found why it does not look so for your systems: the system which I'm using for testing has a extremely fast SCF and slow gradient part, while another system, which I generated, spends 2 min at SCF and only 10 seconds for gradients. |
My system with 528 atoms, build from c8c187c:
Build from 9c78931:
|
I have also tried to make a script for reproducing my numbers :-) git clone https://github.com/grimme-lab/xtb.git xtb.opt
cd xtb.opt
cmake -Bbuild_main -DCMAKE_BUILD_TYPE=Release
# -- The C compiler identification is GNU 14.1.0
# -- The Fortran compiler identification is GNU 14.1.0
# -- Cray Programming Environment 2.7.23 C
# -- Found OpenMP_Fortran: -fopenmp (found version "4.5")
# -- Found BLAS: implicitly linked
make -C build_main -j 40
git remote add foxtran https://github.com/foxtran/xtb.git
git fetch foxtran
git checkout foxtran/feature/speedup-gfn2
cmake -Bbuild_opt -D CMAKE_BUILD_TYPE=Release
make -C build_opt -j 40
mkdir -p build_main/check/
mkdir -p build_opt/check/
cat << EOF > gen.py
#!/usr/bin/env python
N = 20
Nat = N * N
out = [f"{Nat}", "Fluorine plane"]
for i in range(0, N):
for j in range(0, N):
out.append(f"F {i}.0 {j}.0 0.0")
open("build_main/check/F-plane.xyz", "w").write("\n".join(out))
open("build_opt/check/F-plane.xyz", "w").write("\n".join(out))
EOF
chmod +x gen.py
./gen.py
cd build_main/check/
OMP_NUM_THREADS=8 ../xtb F-plane.xyz -P 8 --gfn 2 --opt --alpb water --cycles 5
# SCC (total) 0 d, 0 h, 0 min, 21.385 sec
# .............................. CYCLE 1 ..............................
# SCC iter. ... 0 min, 2.515 sec
# gradient ... 0 min, 1.880 sec
# .............................. CYCLE 2 ..............................
# SCC iter. ... 0 min, 17.930 sec
# gradient ... 0 min, 2.432 sec
# .............................. CYCLE 3 ..............................
# SCC iter. ... 0 min, 14.734 sec
# gradient ... 0 min, 2.539 sec
#
# total:
# * wall-time: 0 d, 0 h, 1 min, 44.294 sec
# * cpu-time: 0 d, 0 h, 10 min, 30.204 sec
# * ratio c/w: 6.043 speedup
# SCF:
# * wall-time: 0 d, 0 h, 0 min, 21.394 sec
# * cpu-time: 0 d, 0 h, 2 min, 17.542 sec
# * ratio c/w: 6.429 speedup
# ANC optimizer:
# * wall-time: 0 d, 0 h, 1 min, 22.522 sec
# * cpu-time: 0 d, 0 h, 8 min, 11.316 sec
# * ratio c/w: 5.954 speedup
cd ../../
cd build_opt/check/
OMP_NUM_THREADS=8 ../xtb F-plane.xyz -P 8 --gfn 2 --opt --alpb water --cycles 5
# SCC (total) 0 d, 0 h, 0 min, 17.003 sec
# .............................. CYCLE 1 ..............................
# SCC iter. ... 0 min, 2.285 sec
# gradient ... 0 min, 1.033 sec
# .............................. CYCLE 2 ..............................
# SCC iter. ... 0 min, 14.642 sec
# gradient ... 0 min, 1.102 sec
# .............................. CYCLE 3 ..............................
# SCC iter. ... 0 min, 11.684 sec
# gradient ... 0 min, 1.213 sec
#
# total:
# * wall-time: 0 d, 0 h, 1 min, 21.159 sec
# * cpu-time: 0 d, 0 h, 8 min, 56.798 sec
# * ratio c/w: 6.614 speedup
# SCF:
# * wall-time: 0 d, 0 h, 0 min, 17.014 sec
# * cpu-time: 0 d, 0 h, 1 min, 58.051 sec
# * ratio c/w: 6.939 speedup
# ANC optimizer:
# * wall-time: 0 d, 0 h, 1 min, 3.859 sec
# * cpu-time: 0 d, 0 h, 6 min, 57.683 sec
# * ratio c/w: 6.541 speedup As you can notice, the parallelization became a little bit better. However, if SCF sucks, this patch does not provide significant improvements :( |
This PR improves parallelization for GFN2 geometry optimization. It affects other types of calculations too.
Achieved speed-up is 30%. Gradients are 2x faster!
For 578 atom structure, before the patch:
After the patch:
Command for testing:
model.xtb (nothing interesing, to be honest):
Build type: CMake, Release, gfortran-14, Intel MKL, no march (can get extra speed-up), enabled debug info.