Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel disturbance #875

Merged
merged 10 commits into from
Jul 19, 2022
Merged

parallel disturbance #875

merged 10 commits into from
Jul 19, 2022

Conversation

ckoven
Copy link
Contributor

@ckoven ckoven commented May 25, 2022

We've had an issue for ages that on a given day, FATES only resolves the larger of disturbance types between tree fall, fire, and logging. This code changes that so that all three are resolved every day. The solution is to just loop over disturbance types rather than evaluate which is largest.

fixes #490.

Description:

Collaborators:

Expectation of Answer Changes:

This should change answers. In particular, the sum of all disturbance rates should now equal the FATES_POTENTIAL_DISTURBANCE rate.

Checklist:

  • My change requires a change to the documentation.
  • I have updated the in-code documentation .AND. (the technical note .OR. the wiki) accordingly.
  • I have read the CONTRIBUTING document.
  • FATES PASS/FAIL regression tests were run
  • If answers were expected to change, evaluation was performed and provided

Test Results:

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

@@ -515,6 +440,8 @@ subroutine spawn_patches( currentSite, bc_in)
! If nocomp is not enabled, then this is not much of a loop, it only passes through once.
nocomp_pft_loop: do i_nocomp_pft = min_nocomp_pft,max_nocomp_pft

disturbance_type_loop: do i_disturbance_type = 1,N_DIST_TYPES

! calculate area of disturbed land, in this timestep, by summing contributions from each existing patch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets get an indent in here, like the loop label

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I did a big auto-indent for this subroutine.

currentPatch%disturbance_rates(i_dist2) = currentPatch%disturbance_rates(i_dist2) &
* oldarea / currentPatch%area
end do
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following through the logic here with a generic example, and I think it checks out:

lets assume that the patch starts out with an area of 1., and all three types apply 0.333 disturbance rates.

After the first disturbance, we have an area of 0.666, and the remaining two rates are bumped up to 0.333*(1/0.666) = 0.5.

After the second disturbance, we have an area of 0.33 = 0.66 * 0.5, and the remaining disturbance rate is bumped up to 0.5*(0.66/0.33) = 1.0

The last disturbance will cover the entire remaining area fraction.

Copy link
Contributor

@rgknox rgknox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look great. Only request is that we apply the indent to the big new loop. Also, this change is very central to how FATES operates. I think it would be good if we run a couple of multi-decade sanity check simulations on the f10 grid and compare to base (with fire turned on) just to see how things compare and if the results are sensible. Alternatively, or additionally, we could consider a regional simulation with a higher resolution grid, which may capture a few events with 100% disturbance in high fire areas. But I'm not sure if that is necessary.

@glemieux glemieux self-assigned this Jun 27, 2022
@glemieux
Copy link
Contributor

I started up a 50 year f10 run on Cheyenne and ran into a patch fusion error around the 12 year mark. I haven't looked into it deeply yet, but wanted to report it asap:

34: FATES fuse_patches(): currentPatch is not associated?
34: ERROR in EDPatchDynamicsMod.F90 at line 2335
...
34: ERROR: Unknown error submitted to shr_abort_abort.
34:Image              PC                Routine            Line        Source
34:cesm.exe           00000000013867D6  Unknown               Unknown  Unknown
34:cesm.exe           0000000000F07BE0  shr_abort_mod_mp_         114  shr_abort_mod.F90
34:cesm.exe           00000000008E0380  fatesglobals_mp_f          94  FatesGlobals.F90
34:cesm.exe           0000000000899E9E  edpatchdynamicsmo        2335  EDPatchDynamicsMod.F90
34:cesm.exe           0000000000887B87  edmainmod_mp_ed_e         276  EDMainMod.F90
34:cesm.exe           00000000005F3883  clmfatesinterface        1025  clmfates_interfaceMod.F90
34:cesm.exe           00000000005CC59D  clm_driver_mp_clm        1136  clm_driver.F90
34:cesm.exe           0000000000577A72  lnd_comp_nuopc_mp         886  lnd_comp_nuopc.F90

The case directory is here:

/glade/u/home/glemieux/scratch/ctsm-cases/pr875-longterm.fates-sci.1.57.4_api.24.0.0-8-gbd994a1c-ctsm5.1.dev099-C25a7cd3f2-Fbd994a1c.intel

@rgknox
Copy link
Contributor

rgknox commented Jun 30, 2022

At first glance, it does seem logical that this error would trigger. If we are looping through the tpp patch loop, while currentPatch is oldest, and we do trigger a fusion, then currentPatch will be set to the next older patch, here:

tmpptr => currentPatch%older
call fuse_2_patches(csite, currentPatch, tpp)
call fuse_cohorts(csite,tpp, bc_in)
call sort_cohorts(tpp)
currentPatch => tmpptr

However, the patch older than the oldest should be null.

@rgknox
Copy link
Contributor

rgknox commented Jun 30, 2022

The reason it is changing the pointer of currentPatch is because it will cease to exist during fuse_2_patches, see these lines:

if(associated(dp%older))then
olderp => dp%older
else
olderp => null()
end if
if(associated(dp%younger))then
youngerp => dp%younger
else
youngerp => null()
end if
! We have no need for the dp pointer anymore, we have passed on it's legacy
call dealloc_patch(dp)
deallocate(dp)

I think we just need to add some more complete logic where we set tmpptr prior to the fusion call.

if(associated(currentPatch%older))then
tmpptr => currentPatch%older
else
tmptr => currentPatch%younger
end if

If the second condition (ie younger) yields a null pointer, then you only have 1 patch. If you only have 1 patch, than you don't need to be fusing, so that shouldn't happen. Anyone else see this the same way?

@glemieux
Copy link
Contributor

At @ckoven suggestion I also ran this for master (using ctsm5.1.dev099) and saw the same issue, although it too a little while longer to crop up (21 years).

@glemieux
Copy link
Contributor

Per the fates software discussion today, it was agreed that I should implement Ryan's suggested fix and retest on this PR branch even though the issue is not directly caused by these changes.

@glemieux
Copy link
Contributor

glemieux commented Jul 11, 2022

Prior to running the original long-term case, I also ran regression tests against the baseline. All expected tests pass, with three non-b4b results:

  • ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.cheyenne_intel.clm-FatesColdDefLandUse
  • ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.cheyenne_intel.clm-FatesColdDefReducedComplexNoComp
  • ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.cheyenne_intel.clm-FatesColdDefLogging

I believe these make sense as two of the three add in explicit disturbance modes to compete with fire and treefall and thus would be more likely to exhibit differences. Differences just due to fire and treefall disturbance are not likely to show up except in NoComp mode, where the isolation of pfts tends to exhibit certain dynamic updates earlier (compared to fully dynamic competition).

File location on Cheyenne: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr875-dev099

@rgknox
Copy link
Contributor

rgknox commented Jul 12, 2022

Another solution to this issue, which might be a little better, would be to just remove the endun:

if(.not.associated(currentPatch))then
write(fates_log(),*) 'FATES fuse_patches(): currentPatch is not associated?'
call endrun(msg=errMsg(sourcefile, __LINE__))
endif

If the currentPatch loop is not associated, it would be because the we just fused the oldest patch and are now pointing to null(), which is an expected outcome. Following this we check for association, so the code should behave.

I like this better than the previous idea because we would not be attempting fusion on the second oldest patch again. By attempting fusion on the second oldest, it gets two opportunities to fuse, which is more than the other patches and is not equitable. If more fusion is required, we should restart the loops. Which will be achieved by just removing the endrun listed above.

Another option would be to have the result of the fusion live in the currentPatch, instead of the tpp patch. That way, if the tpp patch was at fused at its oldest position, the pointer would be moved to the older (null) position, which will signal to just break out of the inner tpp loop because it reached its end. This would seem to be slightly more efficient, but would require some double checking and care that I'm not overlooking something here (I don't like this last option because it changes the logic, ie the currentPatch could experience multiple fusions in a row)

@glemieux
Copy link
Contributor

Note that 6e00ae0 fixes #882.

@glemieux glemieux linked an issue Jul 13, 2022 that may be closed by this pull request
@glemieux
Copy link
Contributor

Regression tests re-run with the same results as before.

Folder location on cheyenne: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr875-remove_endrun

@glemieux glemieux merged commit 192e568 into NGEET:master Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Patch fusion error: patch not associated Make disturbance rates happen in parallel
3 participants