Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup get_oper.F90 #7

Closed
2 tasks done
ponweist opened this issue Aug 20, 2014 · 1 comment
Closed
2 tasks done

Cleanup get_oper.F90 #7

ponweist opened this issue Aug 20, 2014 · 1 comment
Assignees

Comments

@ponweist
Copy link
Owner

When re-running the testcase from #2 in order to compare the performance after parallelization of kpath (see #5), initialization phase is very inefficient again:

trace-iss7

Testcase parameters

32sm running at 64 processes:

kpath = T
kpath_task = curv
kpath_num_points = 500
kpath_bands_colour = spin

kslice = F

berry = T
berry_task = ahc
berry_kmesh = 48 48 48

Analysis

The reason for the inefficiency is that for the specific testcase parameters instead of the optimized routine get_morb_R (see #3), get_ahc_R is called, containing again the inefficient loop (get_oper.F90, lines 402ff.):

          ! Wannier-gauge overlap matrix S in the projected subspace
          !
          call get_win_min(ik,winmin_q)
          call get_win_min(nnlist(ik,nn),winmin_qb)
          S=cmplx_0
          do m=1,num_wann
             do n=1,num_wann
                do i=1,num_states(ik)
                   ii=winmin_q+i-1
                   do j=1,num_states(nnlist(ik,nn))
                      jj=winmin_qb+j-1
                      S(n,m)=S(n,m)&
                           +conjg(v_matrix(i,n,ik))*S_o(ii,jj)&
                           *v_matrix(j,m,nnlist(ik,nn))
                   end do
                end do
             end do
          end do

TODO

  • Cleanup get_oper.F90 and minimize duplicated code.
    • Possible approach: Join different get_* routines to a single routine, providing logical flags as parameters for indicating which matrices need to be initialized.
  • Consistently use get_gauge_overlap_matrix instead of nested loops similar to the above code snippet.
    • Optoinal: Think about better names for get_gauge_overlap_matrix and its parameters.
@ponweist
Copy link
Owner Author

Trace after e6d520d:
trace-iss7-fix

Time for initialization is down from 143s to 8s.

ponweist pushed a commit to ponweist/wannier90 that referenced this issue Mar 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants