Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spending lots of time generating CDFs #470

Closed
saultyevil opened this issue Feb 11, 2019 · 14 comments
Closed

Spending lots of time generating CDFs #470

saultyevil opened this issue Feb 11, 2019 · 14 comments

Comments

@saultyevil
Copy link
Collaborator

From profiling a standard AGN model with macro atoms, I found that a significant amount of time was being spent in one_ff; almost 18% of the entire run! After a bit of investigation, it looks like Python is spending a lot of time generating CDFs.

CDF_gen_from_array is called from one_ff and generates a CDF for a given temperature. The generated CDF will be stored in memory and if the temperature between calls of one_ff doesn't change (between cells), then the previous CDF will be re-used. However, as one_ff is being called all over the entire simulation domain, the electron temperature obviously changes, so a new CDF is re-generated for, what I guess, is most calls to one_ff.

I think, If we aren't too worried about our current total memory usage, we could maybe improve this situation by storing CDFs for each cell?

I've attached the pf in question, as well as the output from profiling.

fiducial_agn.pf.txt
prof.out.txt

@Higginbottom
Copy link
Collaborator

Is this allied to bug #460?

@saultyevil
Copy link
Collaborator Author

Hm, I'm not sure. But, I don't think so. At least according to the caller and callee graphs in the doxygen documentation, neither one_ff or ff make reference to alpha_sp which is the time consuming part in issue #460.

@jhmatthews
Copy link
Collaborator

Good spot Ed. Is this only in macro-atom runs? If so, I'm guessing it's to do with generating free-free photons every time there is a k->r transition and free-free cooling is chosen as the channel.

I think the alpha_sp timesink is more to do with doing integrals, so probably not related.

Note that I had stumbled upon issues with one_ff in k-packet land before, see e.g. issues #300 #187 and I guess also the general issues around CDFs are perhaps relevant, e.g., #293 #297 - but it seems I'd never done a great job in making the code efficient.

Your suggestion sounds sensible about storing CDFs especially if the memory requirement is fairly small (which it should be?). We could also think if an analytic approximation could also be used to generate the frequency?

@saultyevil
Copy link
Collaborator Author

Yup, it seems only to be a problem in macro mode, I'll attach the simple atom profile as well.

prof_simple.txt

Looking closer at the profiler output for the macro run, it's looking like kpkt is making a lot of calls to one_ff. So then I guess maybe one question we need to ask first is, what's causing all of these ff photons?

@jhmatthews
Copy link
Collaborator

The choice of process via which the k-packet gets turned into an r-packet is calculated from the relative cooling rates. I'm guessing these are relatively dense, ionized cells where free-free cooling wins over line cooling, say, but not so hot that Compton cooling matters? It would be quite interesting to know where free-free cooling matters in the AGN model anyway.

Having said that, the number of calls to one_ff is not that high compared to the number of overall calls to kpkt and other routines, so perhaps you don't need free-free to be that important for this to matter.

@kslong
Copy link
Collaborator

kslong commented Feb 13, 2019

I have not looked at this yet. But why would your generate the cdf more than once each time one activates a macroatom. I can understand why every time you activate you might have to do this once, but not n times for one activation.

@kslong
Copy link
Collaborator

kslong commented Mar 16, 2019

I believe this is the same as #385

@kslong
Copy link
Collaborator

kslong commented May 8, 2019

@jhmatthews @saultyevil I also would like a solution to this problem if at all possible. I tried to run a large grid of sv models, and while some of them ran fine in less than an hour, others took up to a day (with 16 cores on our linux servers). I am sure James is right from looking at these, that the problem are worse whe that the winds are dense and so one has lots of macro atom interactions. I am trying to profile a shorter version of one of the "bad" models.

@kslong
Copy link
Collaborator

kslong commented May 10, 2019

Here is my version of this problem. Everything does appear to be associated with the generation of ff photons, although it's the calculation of the gaunt factor and not cdf's specifically that seem to be the issue.

short.pf.txt

For this pf file, 40% of the total program time is being spent in gaunt_ff (according to gprof)

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 40.60   4251.55  4251.55 3184965952     0.00     0.00  gaunt_ff
 27.00   7078.49  2826.94 123621451     0.00     0.00  matom
  7.14   7826.32   747.84                             mt_get_double
  6.26   8481.73   655.41 5107315168     0.00     0.00  random_number

Note that gaunt_ff could be written somewhat more efficiently (and should be), but I doubt that will really solve the problem. There are only 45 data lines in the Southerland dataset.

The routine is being called 1e9 times, whereas teh random number generator is being called 5e9 This huge number of calls is really the issue.

@kslong
Copy link
Collaborator

kslong commented May 10, 2019

A couple more facts about this. This is a run with H and He macroatoms, and this is one of the things that increases the number of calls to gaunt_ff

If I run short.pf above on one processor one one of the machines at STScI, it takes 13,000 s, if I run the same model with a dataset that excludes the Sutherland gaunt factor calculation it takes 8600, or 2/3 the time. In this instance, matom becomes the dominant place in the routine where time is spent. Roughly half the time in the program is spent there.

When we construct a cdf for ff emission where a lot of the work is being done, we are constructing pdfs with 1000 points currently. Clearly some gain could be made by lowering this number, and that would be outside the question of the gaunt factor.

kslong added a commit that referenced this issue May 12, 2019
…land and (#516)

Basically all that has been done here is to speed up the Dopita and Sutherland gaunt factor calculation by breaking out of the loop designed to find the correct range for a particular approximation.   This is associated with issue #470
@kslong
Copy link
Collaborator

kslong commented Aug 4, 2019

Note that at least some attempt to speed up the gaunt factor calculation has been incorporated into dev. So before proceeding with this further, one should retest the parameter files above with python in profiling mode to see where any problems lie.

@saultyevil
Copy link
Collaborator Author

Just to update this issue, I am currently profiling the model posted in the original post for both macro and simple atom.

The simple atom version has finished, but the macro atom is still going. Here is the profiler output from the simple atom version.

profiler_output.txt

The new profiler looks very similar to the version posted Feb 13. What I mean by this is the same offending functions are still the main time sink in both versions. We are still spending most time in radiation(). There is an interesting difference that apparently we now spend less time in coord_fraction() - it doesn't look like this code or calls to it have been modified recently so this is interesting.

Also worth noting is that den_config() is not a major offender as suggested in Issue #642. I don't think there has been any modification to the code to change the number of den_config calls since Knox profiled Python. Strange!

@saultyevil
Copy link
Collaborator Author

I've now had a closer look at the standard AGN model w/ macro atoms and the short pf which Knox profiled earlier in this thread.

one_ff still seems to be quite a large time sink, even with not that many calls to it. The only thing one_ff is doing is calling cdf_gen_from_array, with ~75% of the calls to this function being made in one_ff, and calling cdf_get_rand which appears to take up no time whatsoever in the runs.

So it looks like we are still at the original issue here. Is there anything we can do to optimise these functions? It doesn't look like there's much fat in either functions to trim, so it's not clear if it's worth the time trying to optimise the function even further.

Here's a bit of a breakdown of the new profiler output for those curious.

agn macro

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 11.00   9360.32  9360.32 1256619787     0.00     0.00  bf_estimators_increment
 10.39  18200.53  8840.21  4013206     0.00     0.00  one_ff
 10.24  26916.53  8716.00 2458179171     0.00     0.00  kappa_bf
  9.92  35356.44  8439.91 12525161863     0.00     0.00  sigma_phot

profile.out.txt

This doesn't look all that different from the previous run. one_ff is still a significant chunk. The only real difference is that bf_estimators_increment has increased from ~5% time to 11%.

short

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 38.59   1464.17  1464.17 134080015     0.00     0.00  matom
 19.39   2199.88   735.71   480287     0.00     0.00  one_ff
 12.58   2677.13   477.25                             mt_get_double
  4.67   2854.31   177.18 4871269393     0.00     0.00  random_number

profile.out.txt

The changes made to gaunt_ff improved the run time and it is no longer a significant part of the simulation. one_ff is now the significant part of the macro stuff it seems. But curiously, we are spending a bit of time in mt_get_double, which is something in gsl-2.6 in the file /rng/mt.c which is to do with the RNG generator we use. Not too much we can do about optimising the RNG, I think.

@saultyevil
Copy link
Collaborator Author

We have decided to close this as it appears that the original problem has been improved upon, and it's unlikely that we can further optimise this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants