-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes Redi Surface Taper double counting #5171
Conversation
Currently the SlopeTriads are tapered at the surface AND tapered in mpas_ocn_tracer_hmix_redi as well, which is a double counting. The tapering in the latter is removed.
The sfc tapering is first applied in mpas_ocn_gm.F here https://github.com/E3SM-Project/E3SM/blob/master/components/mpas-ocean/src/shared/mpas_ocn_gm.F#L353-L361 and also in tracer_hmix_redi so was applied twice. |
This PR is nonBFB and potentially climate changing, but the impact is likely small. |
Thanks for the catches @mark-petersen I think they are all fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared visually to commit 0a90dc3 on branch vanroekel/vanroekel/ocean/add-submesoscale-eddies
, which is what @vanroekel used to run the long tests at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.vanroekel/E3SMv2/20220715.submeso.piControl.ne30pg2_EC30to60E2r2.chrysalis/. This PR has the identical code for these lines. Also tested with nightly suite on stand-alone for gnu debug and intel debug. Using gnu optimized, this PR is not bfb for tests with Redi on, as expected.
Started two 10-year LR tests on anvil to check the impact of this non-BFB PR |
I completed the two 10-year runs, one as a baseline and the other including this branch. The results comparing the two are here: @vanroekel - if you get a chance, could you please make sure the differences are as expected? And @xylar and @mark-petersen, I would very much appreciate it if you could both check as well. The third run @xylar and I discussed, which includes PR #5172 as well, has completed 5-years and waiting in the anvil queue to get to 10... |
Thanks for posting this @jonbob. It appears that the mean and variability of all these plots is very close, and differences are what you would expect from two ensemble members. This can be seen most easily in the transports, but other statistics are similar: |
Thank @jonbob. I'll want to see the comparison with the 3rd run before making any assessment. |
@vanroekel, for context, I had requested the 3 runs just so we can be sure what the relative effects of this change are compared with turning on submesoscale. @jonbob agreed to do 2 of those runs last week and then we agreed it was easier to redo the 3rd rather than trying to compare with the existing submesoscale run. I presume we'll see much bigger changes in the latter but I just don't want us to be mistakenly attributing changes to the submesoscale parameterization that actually come from this bug fix (even if that might be pretty harmless). |
Thanks for the additional context @xylar this was my hunch as to why you requested the run. and I agree it makes good sense to do that run prior to signing off on this PR. |
I got a seaice error at 00090402, so I've flipped the BFBFLAG and resubmitted |
With the BFBFLAG flipped, I ran into an ocean state validation error at 00090406 on the anvil run. Because I had been waiting in the queue for so long, I had also started a test on chrysalis. That one hit an ocean state validation error at 00070223, so I've flipped the BFBFLAG there as well and that run is past that point now. @vanroekel -- I'll try to figure out where your long run is and compare the input files and configurations |
A diff of @vanroekel's mpaso_in file and the one generated by my codebase shows some potential issues:
Is there a reason our default value for config_submesoscale_ce is different? And the vertical tracer advection? |
The submesoscale parameter is something I changed that we should change by default, sorry for missing that earlier. I don’t know about the vertical advection parts. I never changed those. Perhaps the VLR capability went in after I did the big run? |
Thanks @vanroekel - we'll get that merged in, and I'll rerun my test with the value to match. DID you run into a bunch of state validation errors getting your run off the ground? On chrysalis my restart failed again at 00090326 so I've flipped the BFBFLAG and trying again. But after that I'll reset the submesoscale parameter and start over |
@vanroekel - we can change the submesoscale parameter in PR #5172, when we turn the model on |
@jonbob the options that changed above are the correct translation for running with flux-form vertical advection (i.e. no change). See here: |
Thanks @mark-petersen -- good to know. So the namelist comparison makes sense, once we change config_submesoscale_ce? I have a test running with the updated value and will post on its progress |
Yes, after you change |
Coordinated with @vanroekel on comparing code with branch used for successful 800 year simulation. Added the following commit. This code was in the long simulation but not in this PR. Tested with gnu debug using stand-alone nightly suite. |
sfcTaper = min(RediKappaSfcTaper(k, cell1), RediKappaSfcTaper(k, cell2)) | ||
if (k < min( indMLD(cell1), indMLD(cell2))) then | ||
sfcTaper = 0.0_RKIND | ||
else | ||
sfcTaper = 1.0_RKIND | ||
end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the first line here isn't replaced by the last 5. It seems like the work done in the first line now does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could something be added to explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, I'm wanting to make sure these still relate to the topic of this PR, the double counting of the Redi surface taper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your last point is a very good one. This is definitely beyond the title of the PR. This is now more along the lines of 'generic redi fixes'. I'll add a bit of explanation here about this change -- it is physically motivated and I believe fixes the stability issues @jonbob was seeing. Within the upper ocean the mesoscale eddies act horizontally and transition to along isopycnal in the interior (e.g. Ferrari et al 2005), this is a very rudimentary way to achieve that. The reason I think this will fix the issue with stability is that when I removed the surface taper double counting the redi cross terms became stronger, leading to numerical instability. This change is effectively a much stronger taper on those terms. I included this in my long 800 year spin up and didn't see stability issues, but Jon sees them quickly. I can add more explanation to this PR and/or in the code.
Finally, you are absolutely right that the first line of five is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! If you update the PR description, I am happy with this being included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more than happy to add this to the description, but I may hold off until @jonbob can test the impact to see if this stabilizes the run.
@@ -351,6 +351,11 @@ subroutine ocn_GM_compute_Bolus_velocity(statePool, & | |||
slopeTaperDown = 1.0_RKIND + slopeTaperFactor*(slopeTaperDown - 1.0_RKIND) | |||
|
|||
sfcTaper = min(RediKappaSfcTaper(k, cell1), RediKappaSfcTaper(k, cell2)) | |||
if (k < min( indMLD(cell1), indMLD(cell2))) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (k < min( indMLD(cell1), indMLD(cell2))) then | |
if (k < min(indMLD(cell1), indMLD(cell2))) then |
@xylar @jonbob @mark-petersen -- I agree with Xylar that the changes to the surface taper are not consistent with the PR title. If this fix helps relieve Jon's instability issues, is it possible/sufficient to change the PR title / description? Or is a new PR needed? i think the branch name is still appropriate. |
@vanroekel - it's easy to change the PR title and description, with edit buttons. And I do not believe there's any point in changing the branch name, because that does require a new PR. So please let me know if you need any help editing this PR, and we'll otherwise assume to push on getting this PR in. |
@vanroekel, @mark-petersen - the test of this PR ran 10 years just fine, but with #5172 added to turn on the submesoscale, my test still crashes. This time it's at 00050130, and has left a pile of error and debug files if anyone wants to try to make sense of them. If so, they're at:
I'll flip the BFBFLAG and see if it holds together |
With a little coaxing (and BFBFLAG flipping) I did get the 10-year test with the submesoscale model on to complete as well. Here are links to the maps-analysis output for both: 20220920.PR5171.chrysalis - test with this PR merged but the mesoscale model OFF 20220920.PR5172.chrysalis - test with this PR merged and the mesoscale model ON @vanroekel, @xylar, @mark-petersen - please check the output and make sure it all makes sense to you please note that the analysis package was run comparing these runs to a baseline instead of observations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the results of @jonbob's 3 simulations and the analysis comparing them, I'm ready to approve this. From the analysis comparing this PR to the baseline, I don't see any changes that consistently point to the simulation results getting either worse or better, they are simply slightly different (like a different ensemble member).
On the other hand, significant changes are visible from #5172, as expected. Over a 10-year period, it is not clear if these changes are in a desirable direction so that's where @vanroekel's longer simulations are required. But for the purposes of approving this PR, it is clear that the submesoscale parameterization and not this bug fix are most likely responsible for the changes we're seeing, as expected.
Thanks you @vanroekel and @jonbob for the very hard work you have put into these PRs, simulations, and analyisis!
@jonbob, I'll try to find out why the hovmoller plots didn't work but I don't think that should hold up this merge. |
@jonbob, just so you know, I found the cause of the MPAS-Analysis issue you had and have a fix here: MPAS-Dev/MPAS-Analysis#900 It is unrelated to this PR (except that it was related to you having to restart because of the BFBFLAG flips you were having to do). |
@xylar @mark-petersen Just a note about @jonbob's comment about coaxing the runs to finish and the instability of the code with submesoscales on. Looking through I found an issue in the submesoscale implementation where I named the variable gradBuoyEddy, but to achieve BFB in #5099 the computation of gradBuoyEddy was changed to be the calculation of the density gradient. I have fixed this in #5172 and Jon tested this and it seems to be running smoothly through 13 years. |
@xylar one last note - I modified the PR description to explain the other change to the sfcTaper. Does that description look good to you? |
@vanroekel, yep, looks good! |
The test of this branch with PR #5172 now runs stably, with no failures through 30 years. Comparison of this run with a 10-year baseline is at: 20220920.PR5172.chrysalis - test with this PR merged and the mesoscale model ON I'll also run diagnostics comparing to observations for the full 30-year run and post as soon as they complete. Here's also output from a 10-year run with just PR #5172 and not this branch, to help tease out the impacts of the two PR's: 20220923.PR5172.chrysalis - test w/o this PR but submesoscale model on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the simulations, before and after this PR are the same, within variability, as expected, both with and without the submesoscale eddy parameterization flag on. Thanks @jonbob for your thorough testing.
MPAS-Analysis of 30-year runs with the submesoscale model on, both with and without this PR, comparing to obs: 20220920.PR5172.chrysalis_vs_obs1-30 - test with this PR merged and the submesoscale model ON 20220923.PR5172.chrysalis_vs_obs1-30 - test w/o this PR merged and the submesoscale model ON |
Simulations look quite similar as noted by @mark-petersen. Some differences I noticed:
|
…5171) Fix Redi Surface Taper double counting Currently the SlopeTriads are tapered at the surface AND tapered in mpas_ocn_tracer_hmix_redi as well, which is a double counting. The tapering in the latter is removed. In addition to removing the double count in the taper, Redi is reduced to horizontal mixing in the mixed layer, consistent with other modeling center implementations and literature (e.g. Ferrari et al 2008). This is accomplished by setting the Redi taper to zero in the mixed layer, which disables all terms except the horizontal mixing term. [NCC]
passes sanity testing -- merged to next |
merged to master and expected DIFFs blessed, except for PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel which seemed to have randomly failed instead of making a DIFF |
Thanks, @jonbob and @vanroekel for the hard work on this. Exciting that we're so close to having #5172 in! |
Currently the SlopeTriads are tapered at the surface AND tapered in
mpas_ocn_tracer_hmix_redi as well, which is a double counting.
The tapering in the latter is removed.
In addition to removing the double count in the taper, Redi is reduced to horizontal mixing in the mixed layer, consistent with other modeling center implementations and literature (e.g. Ferrari et al 2008). This is accomplished by setting the Redi taper to zero in the mixed layer, which disables all terms except the horizontal mixing term.
[NCC]