-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spending lots of time generating CDFs #470
Comments
Is this allied to bug #460? |
Hm, I'm not sure. But, I don't think so. At least according to the caller and callee graphs in the doxygen documentation, neither |
Good spot Ed. Is this only in macro-atom runs? If so, I'm guessing it's to do with generating free-free photons every time there is a k->r transition and free-free cooling is chosen as the channel. I think the alpha_sp timesink is more to do with doing integrals, so probably not related. Note that I had stumbled upon issues with one_ff in k-packet land before, see e.g. issues #300 #187 and I guess also the general issues around CDFs are perhaps relevant, e.g., #293 #297 - but it seems I'd never done a great job in making the code efficient. Your suggestion sounds sensible about storing CDFs especially if the memory requirement is fairly small (which it should be?). We could also think if an analytic approximation could also be used to generate the frequency? |
Yup, it seems only to be a problem in macro mode, I'll attach the simple atom profile as well. Looking closer at the profiler output for the macro run, it's looking like |
The choice of process via which the k-packet gets turned into an r-packet is calculated from the relative cooling rates. I'm guessing these are relatively dense, ionized cells where free-free cooling wins over line cooling, say, but not so hot that Compton cooling matters? It would be quite interesting to know where free-free cooling matters in the AGN model anyway. Having said that, the number of calls to one_ff is not that high compared to the number of overall calls to kpkt and other routines, so perhaps you don't need free-free to be that important for this to matter. |
I have not looked at this yet. But why would your generate the cdf more than once each time one activates a macroatom. I can understand why every time you activate you might have to do this once, but not n times for one activation. |
I believe this is the same as #385 |
@jhmatthews @saultyevil I also would like a solution to this problem if at all possible. I tried to run a large grid of sv models, and while some of them ran fine in less than an hour, others took up to a day (with 16 cores on our linux servers). I am sure James is right from looking at these, that the problem are worse whe that the winds are dense and so one has lots of macro atom interactions. I am trying to profile a shorter version of one of the "bad" models. |
Here is my version of this problem. Everything does appear to be associated with the generation of ff photons, although it's the calculation of the gaunt factor and not cdf's specifically that seem to be the issue. For this pf file, 40% of the total program time is being spent in gaunt_ff (according to gprof)
Note that gaunt_ff could be written somewhat more efficiently (and should be), but I doubt that will really solve the problem. There are only 45 data lines in the Southerland dataset. The routine is being called 1e9 times, whereas teh random number generator is being called 5e9 This huge number of calls is really the issue. |
A couple more facts about this. This is a run with H and He macroatoms, and this is one of the things that increases the number of calls to gaunt_ff If I run short.pf above on one processor one one of the machines at STScI, it takes 13,000 s, if I run the same model with a dataset that excludes the Sutherland gaunt factor calculation it takes 8600, or 2/3 the time. In this instance, matom becomes the dominant place in the routine where time is spent. Roughly half the time in the program is spent there. When we construct a cdf for ff emission where a lot of the work is being done, we are constructing pdfs with 1000 points currently. Clearly some gain could be made by lowering this number, and that would be outside the question of the gaunt factor. |
Note that at least some attempt to speed up the gaunt factor calculation has been incorporated into dev. So before proceeding with this further, one should retest the parameter files above with python in profiling mode to see where any problems lie. |
Just to update this issue, I am currently profiling the model posted in the original post for both macro and simple atom. The simple atom version has finished, but the macro atom is still going. Here is the profiler output from the simple atom version. The new profiler looks very similar to the version posted Feb 13. What I mean by this is the same offending functions are still the main time sink in both versions. We are still spending most time in Also worth noting is that |
I've now had a closer look at the standard AGN model w/ macro atoms and the short pf which Knox profiled earlier in this thread.
So it looks like we are still at the original issue here. Is there anything we can do to optimise these functions? It doesn't look like there's much fat in either functions to trim, so it's not clear if it's worth the time trying to optimise the function even further. Here's a bit of a breakdown of the new profiler output for those curious. agn macro
This doesn't look all that different from the previous run. one_ff is still a significant chunk. The only real difference is that bf_estimators_increment has increased from ~5% time to 11%. short
The changes made to gaunt_ff improved the run time and it is no longer a significant part of the simulation. one_ff is now the significant part of the macro stuff it seems. But curiously, we are spending a bit of time in mt_get_double, which is something in gsl-2.6 in the file |
We have decided to close this as it appears that the original problem has been improved upon, and it's unlikely that we can further optimise this code. |
From profiling a standard AGN model with macro atoms, I found that a significant amount of time was being spent in
one_ff
; almost 18% of the entire run! After a bit of investigation, it looks like Python is spending a lot of time generating CDFs.CDF_gen_from_array
is called from one_ff and generates a CDF for a given temperature. The generated CDF will be stored in memory and if the temperature between calls of one_ff doesn't change (between cells), then the previous CDF will be re-used. However, as one_ff is being called all over the entire simulation domain, the electron temperature obviously changes, so a new CDF is re-generated for, what I guess, is most calls to one_ff.I think, If we aren't too worried about our current total memory usage, we could maybe improve this situation by storing CDFs for each cell?
I've attached the pf in question, as well as the output from profiling.
fiducial_agn.pf.txt
prof.out.txt
The text was updated successfully, but these errors were encountered: