Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in temperature not being accurately reported #110

Closed
kslong opened this issue Sep 7, 2014 · 3 comments
Closed

Changes in temperature not being accurately reported #110

kslong opened this issue Sep 7, 2014 · 3 comments

Comments

@kslong
Copy link
Collaborator

kslong commented Sep 7, 2014

Running a fairly standard SV model of a CV, I find afteer 20 cycles of 1e6 pothons that the follwoing is being reported on the last cycle:

!!wind_update: Max change in t_r 13583 at cell 143 (4,23)
!!wind_update: Ave change in t_r 77549 from 37014 to 114563
!!wind_update: Max change in t_e -14924 at cell 238 (7,28)
!!wind_update: Ave change in t_e 16735 from 18764 to 35499
!!Check_converging: 235 (0.983) converged and 170 (0.711) converging of 239 cells
!!Check_convergence_breakdown: t_r 239 t_e(real) 236 t_e(maxed) 0 hc(real) 237

Although the model appears converged, the average temperatures appear to be changing by a large amount. This seemed odd to say the least. The history of the changes for t_e are:

v_2.diag:!!wind_update: Ave change in t_e 53335 from 35868 to 89203
sv_2.diag:!!wind_update: Ave change in t_e 44617 from 30492 to 75109
sv_2.diag:!!wind_update: Ave change in t_e 37179 from 26020 to 63198
sv_2.diag:!!wind_update: Ave change in t_e 31206 from 22732 to 53939
sv_2.diag:!!wind_update: Ave change in t_e 26614 from 20891 to 47505
sv_2.diag:!!wind_update: Ave change in t_e 23187 from 19949 to 43137
sv_2.diag:!!wind_update: Ave change in t_e 20807 from 19490 to 40298
sv_2.diag:!!wind_update: Ave change in t_e 19157 from 19192 to 38349
sv_2.diag:!!wind_update: Ave change in t_e 18245 from 18993 to 37238
sv_2.diag:!!wind_update: Ave change in t_e 17631 from 18989 to 36620
sv_2.diag:!!wind_update: Ave change in t_e 16948 from 19034 to 35982
sv_2.diag:!!wind_update: Ave change in t_e 16815 from 18985 to 35801
sv_2.diag:!!wind_update: Ave change in t_e 16917 from 18979 to 35897
sv_2.diag:!!wind_update: Ave change in t_e 16732 from 19022 to 35753
sv_2.diag:!!wind_update: Ave change in t_e 16799 from 19001 to 35801
sv_2.diag:!!wind_update: Ave change in t_e 16775 from 19005 to 35780
sv_2.diag:!!wind_update: Ave change in t_e 16756 from 18993 to 35750
sv_2.diag:!!wind_update: Ave change in t_e 16766 from 18991 to 35756
sv_2.diag:!!wind_update: Ave change in t_e 16787 from 18996 to 35783
sv_2.diag:!!wind_update: Ave change in t_e 16951 from 18981 to 35932

It seems obvious that something is quite odd. Both the old values and the new values are changing, but what we are not seeing is that the value from one cycle is moving the old value for the next cycle.

I am surprised that this has not been noticed before.

@kslong kslong added this to the Public Release milestone Sep 7, 2014
@kslong
Copy link
Collaborator Author

kslong commented Sep 7, 2014

I have now run the same model in the single processor mode. It does not show the same behavior there. So this is associated with parallel processing. I suspect something got divided by two or thre when running with 3 processors, when is should not have.

@jhmatthews
Copy link
Collaborator

It turns out this is quite a fiddly one but I've figured it out. There are a number of problems here, actually. It all centres around the loop in which information on cells is exchanged between threads.

What happens here

  for (n_mpi=0; n_mpi < np_mpi_global; n_mpi++)     // loop over all processors
    {
      if (rank_global == n_mpi)               // it's this processor. Pack your stuff into memory
    {
           [...do the packing]
        }
     [...some MPI broadcast and barrier commands]
      if (rank_global != n_mpi)               // it's another processor. unpack your stuff from memory
    {
           [...do the unpacking]
               t_r_ave_old += plasmamain[n].t_r;
          t_e_ave_old += plasmamain[n].t_e;
          iave++;

          //Perform checks to see how much temperatures have changed in this iteration
          if ((fabs (t_r_old - plasmamain[n].t_r)) > fabs (dt_r))
        {
          dt_r = plasmamain[n].t_r - t_r_old;
          nmax_r = n;
        }
          if ((fabs (t_e_old - plasmamain[n].t_e)) > fabs (dt_e))
        {
          dt_e = plasmamain[n].t_e - t_e_old;
          nmax_e = n;
        }
          t_r_ave += plasmamain[n].t_r;
          t_e_ave += plasmamain[n].t_e;
        }      
  }

Basically what we were doing at the bottom there was all wrong - perhaps copy and pastes had not been removed or done properly here:

  • We would end up with the _ave_old quantities being a weird sum of old and new quantities
  • we were packing iave into memory and unpacking so would be using the value from the first thread only to average over
  • To get the maximum changes in temperature we were comparing to the last value from the loop before.

I've fixed this so now, each thread does the following

  • first calculates the old average value for temps before we even start (loops over NPLASMA rather than nmin to nmax)
  • then calculates the sum of its cells temperatures and records where the maximums ocurred
  • it then packs and unpacks for the other threads- it adds their sums to its own, and also checks to see if their maximums are higher.
  • It increments iave correctly and doesn't exchange it with other threads.
  • I've also added an error condition for if iave != NPLASMA. could remove iave entirely but it's probably useful to keep in to check if anything else goes wrong.

This is done in commit agnwinds/python@74364c5 - @kslong perhaps you could just double check on your model that this is behaving as in serial mode?

@kslong
Copy link
Collaborator Author

kslong commented Sep 16, 2014

I confirm that this looks to be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants