-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CTSM-FATES timing/cost #1076
Comments
@dlawrenncar if it helps, my historical transient runs took 20 minutes per decade at 4x5 resolution (so ~1000 gridcells, of which ~750 had living vegetation) on 288 cores on cheyenne. output from that run is here: /glade/scratch/charlie/archive/fates_clm50_global_4x5_historicaltransient_2e3f469f_2905a9ba if you're interested. |
sorry, 20 minutes per year. |
Thanks, I was going to try to establish costs for 1 and 2 deg simulations,
presuming that 4x5 resolution is (a) cheap enough as to mainly be in the
noise and (b) mainly for development/debugging purposes and not
production.
…On Tue, Jul 7, 2020 at 10:58 AM Charlie Koven ***@***.***> wrote:
sorry, 20 minutes per year.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVAJ3FL7BJ6PDVO62VDR2NH4NANCNFSM4OTATHLA>
.
|
here's a recent 4x5 no crop with timing files for comparison
/glade/scratch/wwieder/clm5_4x5_woodCN_cont_spin/run/timing
On Tue, Jul 7, 2020 at 11:53 AM David Lawrence <[email protected]>
wrote:
… Thanks, I was going to try to establish costs for 1 and 2 deg simulations,
presuming that 4x5 resolution is (a) cheap enough as to mainly be in the
noise and (b) mainly for development/debugging purposes and not
production.
On Tue, Jul 7, 2020 at 10:58 AM Charlie Koven ***@***.***>
wrote:
> sorry, 20 minutes per year.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1076 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AFABYVAJ3FL7BJ6PDVO62VDR2NH4NANCNFSM4OTATHLA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5IWJDBPIMWQ4PRKV2IN2DR2NOK5ANCNFSM4OTATHLA>
.
--
Will Wieder
Project Scientist
CGD, NCAR
303-497-1352
|
@dlawrenncar is going to work on this. Note, that you need to make sure that FATES is spunup well for your timing tests. FATES is more likely to have a change in timing cost as the model runs, unlike big-leaf CTSM. |
Hi @dlawrenncar , FATES Fixed biogeog with competition with 6 PFTs. 40y CLM5 BGC no crop. 40y. Broadly, the CLM simulation took 3 hours, and the FATES one took 18. It might be slightly longer given the spinup issues Eric mentioned. I can take a look at the latter years to check (lots to do today before they turn the computer off!!) One interesting thing would be to look at how the FATES aggregation parameters affect the speed (because they should, especially for patch dynamics... |
OK. Looking at the timing files, I see:
FATES Fixed biogeog with competition with 6 PFTs
/glade/work/rfisher/git/ctsmjuly20/cime/scripts/fates_timin_6pft_fbg_comp/timing
82 pe-hrs/yr (74 in first submission)
53 yrs/day
CLM BGC no crop
/glade/work/rfisher/git/ctsmjuly20/cime/scripts/clm5_timing_bgc/timing
14 pe-hrs/y
303 yrs/day
So, as @rosiealice showed, the cost is ~6x. Obviously, that's a lot. I
took a quick look through the timing files to see if I could see the source
of the cost increase and I guess it will not be a surprise that my quick
look suggests that the cost increase is mostly embedded within canflux. I
guess this is simply due to the larger number of calls to photosynthesis
due to the much larger number of patches compared to pfts. So, probably
this is going to be hard to reduce without reducing the cost of the canopy
flux calculations.
At some stage, I will check again at 2deg resolution for the purpose of
writing the next CSL allocation proposal. We should also have a
conversation at some point soon about costs since this large increase in
costs is going to significantly impact computational resource planning and
will have implications for implementation in CESM3.
…On Fri, Jul 10, 2020 at 2:31 AM Rosie Fisher ***@***.***> wrote:
Hi @dlawrenncar <https://github.com/dlawrenncar> ,
I set these 4x5 simulations off before I saw your comment on the 1/2 deg
simulations.
FATES Fixed biogeog with competition with 6 PFTs. 40y
/glade/scratch/rfisher/archive/fates_timin_6pft_fbg_comp/lnd/hist
CLM5 BGC no crop. 40y.
/glade/scratch/rfisher/archive/clm5_timing_bgc/lnd/hist
Broadly, the CLM simulation took 3 hours, and the FATES one took 18. It
might be slightly longer given the spinup issues Eric mentioned. I can take
a look at the latter years to check (lots to do today before they turn the
computer off!!)
One interesting thing would be to look at how the FATES aggregation
parameters affect the speed (because they should, especially for patch
dynamics...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVF45RJZCSIWECSBKZLR23GVHANCNFSM4OTATHLA>
.
|
I'm actually a bit surprised that the number of patches should be greater than the number of PFTs. FATES currently limits it to 10 primary patches per site, which seems like less than 6x the number of PFTs in big-leaf CLM? |
For any given column, we seem to be operating with our 1600 cohorts each requiring their own photosynthesis calculations, a scheme which is embedded in the canflux iterative solve. So only 6X, yay. |
Sorry, yeah, it is the much larger number of cohorts compared to CTSM PFTs
that is the likely source, not the potentially larger number of patches.
So, with much larger number of cohorts compared to PFTs, only 6x, yay! ...
but still 6x, boo!
…On Fri, Jul 10, 2020 at 9:55 AM Ryan Knox ***@***.***> wrote:
For any given column, we seem to be operating with our 1600 cohorts each
requiring their own photosynthesis calculations, a scheme which is embedded
in the canflux iterative solve. So only 6X, yay.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVAH7YES5S2Q5AXGYSDR242V3ANCNFSM4OTATHLA>
.
|
For accuracy's sake, I just want to point out that we actually perform photosynthesis on each leaf-layer of each canopy-layer, and since we use PPA, each pft has its own leaf layer (sorry, I shouldn't had said cohort). So we end up doing photosynthesis on an array that is pft x leaf-layer x canopy layer. In order to make sure we hit all the possible layers, we DO loop by cohort, but we also do a lot of masking to avoid redundancy. See here, we actually loop over leaf layers: https://github.com/NGEET/fates/blob/master/biogeophys/FatesPlantRespPhotosynthMod.F90#L375 That said, I think that reducing our number of leaf layers is something that could potentially reduce computation, and may be low hanging fruit. |
There is also this old issue here: NGEET/fates#386 |
Also for accuracy's sake, while 10 is the maximum number of patches, if the fusion criteria were on the fussy end, and every site had the max number of patches, that would be inefficient. I meant to check the distribution of NPATCHES but all the computer resources are down today because of Wyoming-related things. The number of all calculations (photosynthesis, cohorts etc.) scales with NPATCHES, but as @rgknox mentioned, the correspondance between the number of cohorts and the number of photosynthesis calculations is more complicated... |
Anthony Walker (who isn't on this repo) also had mentioned in the past that he was working on an analytical method for speeding up the photosynthesis routines. If we could do that, it might be pretty useful. I'll ask him... |
I ran 2deg and 4x5 FATES fixed biogeography and got timing costs for 1850 Control 2deg: ~600 pe-hrs/yr (compared to 75 pe-hrs/yr CLM5BGC) = 8x In both cases, cost comes into 'equilibrium' after about 4-6 years of simulation. Cases: Would be good to discuss at forthcoming CTSM-FATES meeting. |
it seems like one place to start on this is to put timing calls around every instance where CTSM calls FATES code, so that we can better understand exactly where the costs in FATES are being incurred? |
FWIW @dlawrenncar with my fire runs on a somewhat old tag (tag1331_api81) with fire disturbance for the tropics only I get |
@jkshuman Interesting. Just to be clear, is this a Tropics only run? I guess it must be. What is the domain more precisely? |
@dlawrenncar yes, tropics only offline land. here is a fig for reference and coords below. Do you want me to kick off another run with a more recent tag? |
I should add that this is a 3PFT run with fire disturbance, so perhaps not directly comparable? |
I don't think we need another run with a more modern tag. For CSL, I am
using global numbers and I have what is needed for the proposal. Next
steps on costs need to be as outlined above to get more accurate and full
understanding of where the costs are coming from.
…On Tue, Aug 18, 2020 at 11:20 AM jkshuman ***@***.***> wrote:
I should add that this is a 3PFT run with fire disturbance, so perhaps not
directly comparable?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVDBHRZ7U56E6VHGUV3SBKZ7NANCNFSM4OTATHLA>
.
|
Cross-reference: see the more recent issue NGEET/fates#859 |
Closing as it looks like the thing that was needed was figured out. |
Definition of done is when Fates timing files added to cesm3 website for SP and FixedBiogeoraphy simulations |
Notes from Jim about how to store the timing results:
|
Procedure needed for some compsets, including fixed-biogeo—see #2745. |
@rosiealice @ckoven @wwieder @ekluzek Following up on conversations we had about CTSM-FATES costs at the LMWG meeting. The new CSL allocation request is going to need to be written this summer and this would be good opportunity to try to establish at least general FATES costs, relative to big-leaf CTSM, for the purpose of writing the proposal. I'm aware that the costs will be more variable through time, but even a ballpark estimate would be helpful for perhaps the SP version and the full competition BGC version. I'd be happy to explore myself, which would give me opportunity to run FATES for the first time. Perhaps @rosiealice could point me to two relevant cases (SP, BGC full competition) to start with.
Definition of done:
The text was updated successfully, but these errors were encountered: