Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The stress calculation is OOM when set kpar > 1 #4432

Closed
9 tasks
pxlxingliang opened this issue Jun 19, 2024 · 6 comments
Closed
9 tasks

The stress calculation is OOM when set kpar > 1 #4432

pxlxingliang opened this issue Jun 19, 2024 · 6 comments
Assignees
Labels
Features Needed The features are indeed needed, and developers should have sophisticated knowledge

Comments

@pxlxingliang
Copy link
Collaborator

Describe the Testing Issue

I have tested two alloy cases with different kpar (1 2 4 8) on 16 cores 256 G cpu machine.
Both cg and dav methods are tested.
For dav, because of the large memory of sub-space matrix, the calculation of SCF is OOM when kpar > 1.
For cg, when kpar > 4, the SCF calculation is OOM, and when kpar is 1/2/4, the SCF is calculated normal but the calculation of stress is OOM.
The OOM in stress calculation when kpar > 2 seems abnormal.

                      kpar ks_solver  scf_time  scf_steps  normal_end  ibzk
cg/mp-1067451/00000      1        cg  24629.77        1.0        True     8
cg/mp-1067451/00001      2        cg  22206.64        1.0       False     8
cg/mp-1067451/00002      4        cg  17113.58        1.0       False     8
cg/mp-1067451/00003      6        cg       NaN        NaN       False     8
cg/mp-1067451/00004      8        cg       NaN        NaN       False     8
cg/mp-1093567/00000      1        cg   9699.77        1.0        True    14
cg/mp-1093567/00001      2        cg   7458.57        1.0       False    14
cg/mp-1093567/00002      4        cg   8201.89        1.0       False    14
cg/mp-1093567/00003      6        cg       NaN        NaN       False    14
cg/mp-1093567/00004      8        cg       NaN        NaN       False    14
dav/mp-1067451/00000     1       dav   5153.27        1.0        True     8
dav/mp-1067451/00001     2       dav       NaN        NaN       False     8
dav/mp-1067451/00002     4       dav       NaN        NaN       False     8
dav/mp-1067451/00003     6       dav       NaN        NaN       False     8
dav/mp-1067451/00004     8       dav       NaN        NaN       False     8
dav/mp-1093567/00000     1       dav   4400.19        1.0        True    14
dav/mp-1093567/00001     2       dav       NaN        NaN       False    14
dav/mp-1093567/00002     4       dav       NaN        NaN       False    14
dav/mp-1093567/00003     6       dav       NaN        NaN       False    14
dav/mp-1093567/00004     8       dav       NaN        NaN       False    14

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Understand the testing issue described by the developer.
  • Review the specific test case, expected and actual results, and any error messages.
  • Identify the root cause of the test failure or issue.
  • If a possible solution is suggested, evaluate its feasibility and effectiveness.
  • Implement a fix for the test failure or issue, or create a new test case if needed.
  • Verify that the fix resolves the testing issue and the test case passes.
  • Review and update any relevant documentation, such as test plans or user guides.
  • Ensure the testing issue is resolved and close the ticket.
  • Share any lessons learned or best practices with the team to prevent similar issues in the future.
@pxlxingliang pxlxingliang changed the title The stress calculation is error when set kpar > 1 The stress calculation is OOM when set kpar > 1 Jun 19, 2024
@pxlxingliang
Copy link
Collaborator Author

As we can see from the results, even using kpar can speed up the calculation of cg with less memory, the time cost is still much longer than that by dav. The speed up is about 10% for kpar is from 1 to 2 in cg.

@pxlxingliang
Copy link
Collaborator Author

I also do the test on QE for example mp-1067451.
For CG method, the kpar will slow down the calculation, which seems strange.
For david method, the calculation is normal for kpar=2, which indicate the less memory cost in QE than ABACUS, and this is because the pw_diag_ndim is 4 in ABACUS and 2 in QE.

CG:

           kpar ks_solver  scf_time  scf_steps  normal_end  ibzk  \
qe/qe-nk1  None      None   27244.7          2       False     8   
qe/qe-nk2  None      None   28781.4          2       False     8   
qe/qe-nk4  None      None   40043.1          2       False     8   

                                 scf_time_each_step  
qe/qe-nk1              [21035.1, 6209.600000000002]  
qe/qe-nk2  [22302.300000000003, 6479.0999999999985]  
qe/qe-nk4              [31618.5, 8424.600000000002]  

DAV:

                             kpar ks_solver  scf_time  scf_steps  normal_end  \
qe-dav-nk1/mp-1067451/00000  None      None    4854.9          2       False   
qe-dav-nk2/mp-1067451/00000  None      None    4368.9          2       False   
qe-dav-nk4/mp-1067451/00000  None      None       NaN          1       False   

                             ibzk                        scf_time_each_step  \
qe-dav-nk1/mp-1067451/00000     8  [3558.7000000000003, 1296.1999999999998]   
qe-dav-nk2/mp-1067451/00000     8              [3181.3999999999996, 1187.5]   
qe-dav-nk4/mp-1067451/00000     8                                      None  

@pxlxingliang
Copy link
Collaborator Author

I set the pw_daig_ndim to 2 in ABAUCS and do the kpar test on mp-1067451 with a larger machine c64_m520_cpu (mpi parallel with 32 cores).
Compared to kpar=1, kpar=2 can speed up the SCF calculation about 20%, while kpar=4 is slower than kpar=2.

It doesn't seem like the larger the kpar, the higher the efficiency.

As the kpar larger, the memory cost is larger. For this case, kpar=1/2 can finish the SCF/FORCE/STRESS calculation, but kpar=4/6 can only finish SCF/FORCE calculation, and the memory for STRESS calculation is larger than 520G.

The memory need by STRESS seems about 1.5 times to SCF/FORCE calculation.

                       kpar ks_solver  scf_time  scf_steps  normal_end  ibzk  \​
mp-1067451-new/00000     1       dav   5704.52        2.0        True     8   ​
mp-1067451-new/00001     2       dav   4321.94        2.0        True     8   ​
mp-1067451-new/00002     4       dav   4567.33        2.0       False     8   ​
mp-1067451-new/00003     6       dav   4913.62        2.0       False     8   ​
mp-1067451-new/00004     8       dav       NaN        NaN       False     8   ​
​
                     scf_time_each_step  ​
mp-1067451-new/00000  [4708.34, 996.18]  ​
mp-1067451-new/00001  [3541.35, 780.59]  ​
mp-1067451-new/00002  [3726.31, 841.02]  ​
mp-1067451-new/00003  [3997.89, 915.73]  ​
mp-1067451-new/00004               None

The memory cost for kpar=1
097275fdcd98f0bb6486105aa27aff76__preview_type=16

@pxlxingliang
Copy link
Collaborator Author

The performance may not be tested by one example, and the performance of c64_m520_cpu is unstable. I have rerun the mp-1067451-new/00002, and this time the time cost of first two SCF steps are 3427.07 and 764.52 s, which is faster than previous test.

@WHUweiqingzhou
Copy link
Collaborator

@pxlxingliang plz double-check it after PR #4047.

@mohanchen mohanchen added the Features Needed The features are indeed needed, and developers should have sophisticated knowledge label Jun 29, 2024
@pxlxingliang
Copy link
Collaborator Author

I use the v3.7.0 abaucs (image:registry.dp.tech/dptech/abacus:3.7.0) test the example, and the memory has no obvious increase in calculating force/stress.
54c5a406bba65f643a857efdf76a8454_27767b44-409a-49c9-922e-08ba64d92fd7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Needed The features are indeed needed, and developers should have sophisticated knowledge
Projects
None yet
Development

No branches or pull requests

4 participants