Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function Stress_Func<FPTYPE, Device>::stress_nl need refactoring to balence memory and performance #4031

Closed
9 tasks
dyzheng opened this issue Apr 20, 2024 · 1 comment · Fixed by #4047
Closed
9 tasks
Assignees

Comments

@dyzheng
Copy link
Collaborator

dyzheng commented Apr 20, 2024

Details

https://dptechnology.feishu.cn/wiki/XuLvwPqiCickS8kPZ8Bcxik7nyd?from=from_copylink

The key target of this refactoring is separate the functions get_vnl, get_vnl1 and get_vnl2, which have formulas as
image

code would be:

for(int ik=0;ik<nks;ik++)//loop k points
{
    for(int it=0;it<ntype;it++)//element types
    {
        // prepare the variables to calculate vkb,vkb1,vkb2
        // prepare vq and vq', size  nq * npwx 
        std::vector<double> vq = cal_vq(it);
        std::vector<double> vq_deri = cal_vq_deri(it);
        int lmax = ppcell.lmaxkb;
        //prepare ylm,size (lmax+1)^2 * npwx
        std::vector<double> ylm = cal_ylm(lmax);
        //prepare ylm',size  3 * (lmax+1)^2 * npwx,for x, y, z axis
        std::vector<double> ylm_deri = cal_ylm_deri(lmax);
        // prepare(-i)^l, size  nh
        std::vector<complex<double>> pref = cal_pref(it);
        for(int ia=0;ia<na;ia++)
        {
            // pointer to access SK, no memory cost here
            complex<double>* sk = get_sk(it, ia);
            // 1. calculate becp
            // 1.a calculate vkb
            cal_vkb(it, ia, vq, ylm, pref, sk, ppcell_vkb);
            // 2.b calculate becp = vkb * psi
            gemm...
            //calculate stress(00,01,02,11,12,22)
            for(int ipol=0;ipol<3;ipol++)
            {
                for(int jpol=ipol;jpol<3;jpol++)
                {
                    // 2. calculate dbecp:
                    // 2.a. calculate dbecp_noevc, reuse the memory of ppcell.vkb
                    cal_dbecp_noevc(it, ia, 
                                    vq, vq_deri, 
                                    ylm, ylm_deri, 
                                    pref, sk, 
                                    ppcell.vkb);
                    // 2.b calculate dbecp = dbecp_noevc * psi
                    gemm...
                    // 3. calculate stress(ipol, jpol) += \sum becp * dbecp
                    cal_stress_nl_op(becp, dbecp...);
                }//jpol
            }//ipol
        }//ia
    }//it
}//ik

Task list for Issue attackers (only for developers)

  • Reproduce the performance issue on a similar system or environment.
  • Identify the specific section of the code causing the performance issue.
  • Investigate the issue and determine the root cause.
  • Research best practices and potential solutions for the identified performance issue.
  • Implement the chosen solution to address the performance issue.
  • Test the implemented solution to ensure it improves performance without introducing new issues.
  • Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • Review and incorporate any relevant feedback from users or developers.
  • Merge the improved solution into the main codebase and notify the issue reporter.
@dyzheng dyzheng added the Performance Issues related to fail running ABACUS label Apr 20, 2024
@dyzheng
Copy link
Collaborator Author

dyzheng commented Apr 20, 2024

This issue aim to solve #3714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants