Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-pe advection=none bug #664

Merged
merged 5 commits into from
Nov 26, 2021
Merged

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Nov 18, 2021

PR checklist

  • Short (1 sentence) summary of your PR:
    Fix multi-pe advection=none bug
  • Developer(s):
    apcraig
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.
    Full test suite run on cheyenne, gnu, everything is bit-for-bit. Also tested 4x1 box2001 test and that now runs. https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#dc1269bcc1970bcc53bd1f95cd88cdac6e06fb68
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:

Also updated the parser to check for invalid cice.settings env variables and removed an obsolete env setting in set_env.box2001.

Improved script performance in parse_namelist and parse_settings as well as ability to reuse ice_in file in test suites.

Updated test default status from FAIL to PEND.

minor refactoring of parse_namelist and parse_settings
to reduce cost and ability to use already setup ice_in file
from a prior case in the suite.

Added commented out timing ability in cice.setup.

Change test default to PEND from FAIL.
@apcraig apcraig requested a review from phil-blain November 23, 2021 16:06
Copy link
Member

@phil-blain phil-blain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI Tony, I had a quick look. I did not look in details at all the script changes.

Regarding the fix in ice_init, I'm not sure I understand the bug (and how it relates to "multi-pe" in the title). From what I could understand, we were only setting advection='none' inside an if (kdyn >=1) block, so if dynamics were disabled, and ktransport=-1 in the namelist, advection would still be running since it was not deactivated. Is that the bug you are fixing ?

Comment on lines 1226 to +1227
set nonomatch && rm -f ciceexe.* && unset nonomatch
set nonomatch && rm -f ice_in_save* && unset nonomatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simply

-set nonomatch && rm -f ciceexe.* && unset nonomatch
+set nonomatch && rm -f ciceexe.* ice_in_save* && unset nonomatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go either way. This happens once at the end of the test suite, so cost is not an issue. By splitting the lines up, it's easier to comment out one or the other for testing/debugging of the ciceexe or ice_in_save feature. Two lines seems a little cleaner, but it doesn't make much difference. I guess I prefer to leave two lines.

Comment on lines 14 to 15
sed -i -e 's|ICE_SANDBOX|'"${ICE_SANDBOX}"'|g' $filename
sed -i -e 's|ICE_MACHINE_INPUTDATA|'"${ICE_MACHINE_INPUTDATA}"'|g' $filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this .sedbak extension to the -i argument is needed on macOS to avoid warnings, I'd prefer we keep them (that's why they were added).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll test on Mac. The sedbak slows things down a bit which is why I removed it after timing the scripts. I was trying to speed things up. If it breaks Mac, then we need to find a work around or use sedbak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with sed on cheyenne as well as three flavors of sed on two Macs. gnu-sed works fine on the Macs if it's installed, which it is on my machines. The default sed is different. I tried various things, again, and have decided to just revert the .sedbak implementation. There are other things we could do, but none are ideal.

grep -q "^[[:space:]]*set.* ${vname}[[:space:]]*" $filename
grepout=$?
if [ ${grepout} -eq 0 ]; then
sed -i -e 's|\(^[[:space:]]*set.* '"$vname"' \)[^#]*\(#*.*$\)|\1 '"$value"' \2|g' $filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here

@apcraig
Copy link
Contributor Author

apcraig commented Nov 23, 2021

Regarding the fix in ice_init, I'm not sure I understand the bug (and how it relates to "multi-pe" in the title). From what I could understand, we were only setting advection='none' inside an if (kdyn >=1) block, so if dynamics were disabled, and ktransport=-1 in the namelist, advection would still be running since it was not deactivated. Is that the bug you are fixing ?

The advection='none' was set inside an "if (mastertask)" so the logic was only carried out on the root task. That's why we couldn't run the advection='none' tests on more than 1 task, it was not set consistently across tasks. This change moves it out of the "if (mastertask)" block and allows that option to work in parallel now. I've been curious, when the no advection test was added was it setup to run 1x1 because it didn't work in parallel or because there's some usefulness to testing it on 1x1. I don't have that answer :), but this fixes that issue. This fix came out of the implementation of more box tests for the CD development.

@apcraig
Copy link
Contributor Author

apcraig commented Nov 24, 2021

Retested everything again and I think it's working fine. Ready to merge again if others agree.

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me but I'm not a scripting expert, so deferring to @phil-blain to approve.

@phil-blain
Copy link
Member

Regarding the fix in ice_init, I'm not sure I understand the bug (and how it relates to "multi-pe" in the title). From what I could understand, we were only setting advection='none' inside an if (kdyn >=1) block, so if dynamics were disabled, and ktransport=-1 in the namelist, advection would still be running since it was not deactivated. Is that the bug you are fixing ?

The advection='none' was set inside an "if (mastertask)" so the logic was only carried out on the root task.

Ah! Indeed; I did not scroll up enough in my review :P

@phil-blain
Copy link
Member

@eclare108213 I'm not a C-shell expert either :P but I'll take a second look.

Copy link
Member

@phil-blain phil-blain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I had another look and I guess if it works and is faster, then we're good.

I'm not exactly sure what the changes in parse_settings.sh are doing, though.. is that also an optimization ?

cice.setup Outdated
@@ -934,9 +936,21 @@ EOF
if (-e ${fimods}) rm ${fimods}
if (-e ${fsmods}) rm ${fsmods}

# Use an existing ice_in file from the suite if it exists
# to reduce time spend in parse_namelist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/spend/spent/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing.

# to reduce time spend in parse_namelist
set skip_parse_namelist = spval
if (${dosuite} == 1) then
set iceinfn = ../ice_in_save_${grid}${soptions}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd maybe prefer we put the saved namelists in a separate directory, so as to not clutter the suite's top level directory, but I don't have that strong of an opinion about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sit in the test suite directory and are deleted at the end of the test suite. You're right that it may be cleaner to put them in a separate directory, but I think we're OK for now.

@apcraig
Copy link
Contributor Author

apcraig commented Nov 24, 2021

OK, I had another look and I guess if it works and is faster, then we're good.

I'm not exactly sure what the changes in parse_settings.sh are doing, though.. is that also an optimization ?

The changes reduce the cost of the parsing. It turns out, these sed commands are taking most of the time setting up the cases. We are using sed to update cice.settings and ice_in for the case based on the grid and options specified. This refactoring improves the sed cost a bit and removes an extra sed that was happening in favor of grep to check whether the namelist/env variable exists. We continue to have some issues with our options setup (including dealing with conflicting or redundant options as well as performance) and I'd love to hear any other ideas how to do this better or different, even moving away from csh :)

@apcraig
Copy link
Contributor Author

apcraig commented Nov 24, 2021

Just to expand a bit, the ice_ic_saved files allow a suite to use a pre-parsed ice_in file if it's available. It still has to change a few things, but many things are reusable. By copying in an existing pre-parsed ice_in file, we save a lot of the sed calls. It only helps so much. For a full test suite on cheyenne, we have about 220 separate cases. Of those, I think there are about 160 unique ice_in files. The benefit really comes if you are running suites on multiple envs (compilers) at the same time because those cases are fully redundant, so after creating the 160 unique ice_in files, you can reuse them all when you run the 2nd compiler.

Overall though, the speed up will be useful. The time for generating a full suite of cases on cheyenne went from about 20 minutes to 11. The second compiler then only takes 4 minutes if it leverages existing ice_in_saved files. So running full suites on cheyenne with 3 compilers previously took about an hour (3 x 20) to setup all the cases, now it take more like 20 minutes (11 + 4 + 4). The compilation and run still takes the same amount of time. The compilation already had this feature where executables are saved and reused when they can be.

I plan to merge this to cgridDEV hopefully fairly soon.

@phil-blain
Copy link
Member

Thanks for the details, this is indeed a nice speedup!

@apcraig
Copy link
Contributor Author

apcraig commented Nov 26, 2021

@phil-blain, could you give a quick approve if you agree, then I'll merge today for weekend testing. thanks.

@apcraig apcraig merged commit 162aee9 into CICE-Consortium:main Nov 26, 2021
apcraig added a commit to apcraig/CICE that referenced this pull request Nov 30, 2021
* update icepack, rename snwITDrdg to snwitdrdg (CICE-Consortium#658)

* Change max_blocks for rake tests on izumi (nothread). (CICE-Consortium#665)

* Fix some raketests for izumi

* fix some rake tests

* Makefile: make Fortran object files depend on their dependency files (CICE-Consortium#667)

When 'make' is invoked on the CICE Makefile, the first thing it does is
to try to make the included dependency files (*.d) (which are in fact
Makefiles themselves) [1], in alphabetical order.

The rule to make the dep files have the dependency generator, 'makdep',
as a prerequisite, so when processing the first dep file, make notices
'makdep' does not exist and proceeds to build it. If for whatever reason
this compilation fails, make will then proceed to the second dep file,
notice that it recently tried and failed to build its dependency
'makdep', give up on the second dep file, proceed to the third, and so
on.

In the end, no dep file is produced. Make then restarts itself and
proceeds to build the code, which of course fails catastrophically
because the Fortran source files are not compiled in the right order
because the dependency files are missing.

To avoid that, add a dependency on the dep file to the rules that make
the object file out of the Fortran source files. Since old-fashioned
suffix rules cannot have their own prerequisites [2], migrate the rules
for the Fortran source files to use pattern rules [3] instead. While at
it, also migrate the rule for the C source files.

With this new dependency, the builds abort early, before trying to
compile the Fortran sources, making it easier to understand what
has gone wrong.

Since we do not use suffix rules anymore, remove the '.SUFFIXES' line
that indicates which extension to use suffix rules for (but keep the
line that eliminates all default suffix rules).

[1] https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html
[2] https://www.gnu.org/software/make/manual/html_node/Suffix-Rules.html
[3] https://www.gnu.org/software/make/manual/html_node/Pattern-Rules.html#Pattern-Rules

* Fix multi-pe advection=none bug (CICE-Consortium#664)

* update parsing scripts to improve robustness, fix multi-pe advection=none

* Update cice script to improve performance including
minor refactoring of parse_namelist and parse_settings
to reduce cost and ability to use already setup ice_in file
from a prior case in the suite.

Added commented out timing ability in cice.setup.

Change test default to PEND from FAIL.

* fix cice.setup for case

* add sedbak implementation to support Mac sed

* s/spend/spent

Co-authored-by: David A. Bailey <[email protected]>
Co-authored-by: Philippe Blain <[email protected]>
apcraig added a commit to apcraig/CICE that referenced this pull request Mar 8, 2022
* update icepack, rename snwITDrdg to snwitdrdg (CICE-Consortium#658)

* Change max_blocks for rake tests on izumi (nothread). (CICE-Consortium#665)

* Fix some raketests for izumi

* fix some rake tests

* Makefile: make Fortran object files depend on their dependency files (CICE-Consortium#667)

When 'make' is invoked on the CICE Makefile, the first thing it does is
to try to make the included dependency files (*.d) (which are in fact
Makefiles themselves) [1], in alphabetical order.

The rule to make the dep files have the dependency generator, 'makdep',
as a prerequisite, so when processing the first dep file, make notices
'makdep' does not exist and proceeds to build it. If for whatever reason
this compilation fails, make will then proceed to the second dep file,
notice that it recently tried and failed to build its dependency
'makdep', give up on the second dep file, proceed to the third, and so
on.

In the end, no dep file is produced. Make then restarts itself and
proceeds to build the code, which of course fails catastrophically
because the Fortran source files are not compiled in the right order
because the dependency files are missing.

To avoid that, add a dependency on the dep file to the rules that make
the object file out of the Fortran source files. Since old-fashioned
suffix rules cannot have their own prerequisites [2], migrate the rules
for the Fortran source files to use pattern rules [3] instead. While at
it, also migrate the rule for the C source files.

With this new dependency, the builds abort early, before trying to
compile the Fortran sources, making it easier to understand what
has gone wrong.

Since we do not use suffix rules anymore, remove the '.SUFFIXES' line
that indicates which extension to use suffix rules for (but keep the
line that eliminates all default suffix rules).

[1] https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html
[2] https://www.gnu.org/software/make/manual/html_node/Suffix-Rules.html
[3] https://www.gnu.org/software/make/manual/html_node/Pattern-Rules.html#Pattern-Rules

* Fix multi-pe advection=none bug (CICE-Consortium#664)

* update parsing scripts to improve robustness, fix multi-pe advection=none

* Update cice script to improve performance including
minor refactoring of parse_namelist and parse_settings
to reduce cost and ability to use already setup ice_in file
from a prior case in the suite.

Added commented out timing ability in cice.setup.

Change test default to PEND from FAIL.

* fix cice.setup for case

* add sedbak implementation to support Mac sed

* s/spend/spent

* nuopc/cmeps driver updates (CICE-Consortium#668)


* add debug_model feature
* add required variables and calls for tr_snow

* Main namelist debug (CICE-Consortium#671)

* Adding method to write erroneous namelist options

* Remove erroneous comma in abort_ice for namelist check

* Added check for zbgc_nml. I missed that namelist in this file.

* Added space and colons for namelist error output

* Added space and colons for namelist error output

Co-authored-by: David A. Hebert <[email protected]>

* NUOPC/CMEPS cap updates (CICE-Consortium#670)

* updated orbital calculations needed for cesm

* fixed problems in updated orbital calculations needed for cesm

* update CICE6 to support coupling with UFS

* put in changes so that both ufsatm and cesm requirements for potential temperature and density are satisfied

* update icepack submodule

* Revert "update icepack submodule"

This reverts commit e70d1ab.

* update comp_ice.backend with temporary ice_timers fix

* Fix threading problem in init_bgc

* Fix additional OMP problems

* changes for coldstart running

* Move the forapps directory

* remove cesmcoupled ifdefs

* Fix logging issues for NUOPC

* removal of many cpp-ifdefs

* fix compile errors

* fixes to get cesm working

* fixed white space issue

* Add restart_coszen namelist option

* Update NUOPC cap to work with latest CICE6 master

* nuopc,cmeps or s2s build updates

* fixes for dbug_flag

* Update nuopc2 to latest CICE master

* Fix some merge problems

* Fix dbug variable

* Manual merge of UFS changes

* fixes to get CESM B1850 compset working

* refactored ice_prescribed_mod.F90 to work with cdeps rather than the mct data models

* Fix use_restart_time

* changes for creating masks at runtime

* added ice_mesh_mod

* implemented area correction factors as option

* more cleanup

* Fix dragio

* Fix mushy bug

* updates to nuopc cap to resolve inconsistency between driver inputs and cice namelists

* changed error message

* added icepack_warnings_flush

* updates to get ice categories working

* updates to have F compset almost working with cice6 - still problems in polar regions - need to resolve 253K/cice6 versus 273K/cice5 differences

* changed tolerance of mesh/grid comparison

* added issues raised in PR

* Update CESM-CICE sync with new time manager

* Add back in latlongrid

* Add new advanced snow physics to driver

* Fix restart issue with land blocks

* Update mesh check in cap

* fix scam problems

* reintroduced imesh_eps check

* Put dragio in the namelist instead

* Remove redundant code

* Fix some indents

Co-authored-by: Mariana Vertenstein <[email protected]>
Co-authored-by: apcraig <[email protected]>
Co-authored-by: Denise Worthen <[email protected]>

* Add CESM1_PIO for fill value check (CICE-Consortium#675)

* Add CESM1_PIO for fill value check

* Revert PIO_FILL_DOUBLE change for now

* - Update the namelist read to make the group order flexible. (CICE-Consortium#677)

- Remove recent update to namelist read that traps bad lines, it conflicts
  with flexibility to read groups in random order picked up by NAG.
- change print* statements to write(nu_diag,*)

* Port to Narwhal and add perf_suite (CICE-Consortium#678)

* Add narwhal intel, gnu, cray, aocc
Add perf_suite.ts

* update narwhal_cray and perf_suite

* Update OMP (CICE-Consortium#680)

* Add narwhal intel, gnu, cray, aocc
Add perf_suite.ts

* update narwhal_cray and perf_suite

* Review and update OMP implementation

- Fix call to timers in block loops
- Fix some OMP Private variables
- Test OMP Scheduling, add SCHEDULE(runtime) to some OMP loops
- Review column and advection OMP implementation
- ADD OMP_TIMERS CPP option (temporary) to time threaded sections
- Add timer_tmp timers (temporary)
- Add omp_suite.ts test suite
- Add ability to set OMP_SCHEDULE via options (ompscheds, ompscheds1, ompschedd1)

* - Review diagnostics OMP implementation
- Add timer_stats namelist to turn on extra timer output information
- Add ICE_BFBTYPE and update bit-for-bit comparison logic in scripts
- Update qc and logbfb testing
- Remove logbfb and qchkf tests, add cmplog, cmplogrest, cmprest set_env files to set ICE_BFBTYPE
- Update documentation

* Update EVP OMP implementation

* - Refactor puny/pi scalars in eap dynamics to improve performance
- Update OMP in evp and eap

* Clean up

* Comment out temporary timers

* Update OMP env variables on Narwhal

* Update gaffney OMP_STACKSIZE

* update OMP_STACKSIZE on cori

* Update Onyx OMP_STACKSIZE
Update documentation

* Update OMP_STACKSIZE on mustang

* - Update Tsfc values on land in various places in the code, was affecting testing.  Specifically fix upwind advection.
- Comment out OMP in ice_dyn_evp_1d_kernel, was producing non bit-for-bit results with different thread counts

* updating LICENSE.pdf for 2022

* seabed stress - remove if statements (CICE-Consortium#673)

* refactor seabed_stress. Bit for bit

* Removed if statement from stepu. Results are binary identical, however taubx and tauby is updated on all iterations instead of just the last one. Not used within iteration

* changed capping from logical to numeric in order to remove if statement. Moved call to deformation out of loop

* clean dyn_finish, correct intent(inout) to intent(in) for Cw, resse Cb in stepu, remove if from seabed_stress_LKD

* Reolve conflicts after updating main

* modified environment for Freya to accomodate for additional OMP commands

* Requested changes after review. Only changed in seabed stress and not bit for bit if cor=0.0

added legacy comment in ice_dyn_finish

* move deformation to subcycling

* - Update version and copyright. (CICE-Consortium#691)

- Remove gordon and conrad machines.
- Add setenv OMP_STACKSIZE commented out in env files
- Update Icepack to fc4b809

* add OMP_STACKSIZE for koehr (CICE-Consortium#693)

* Update C/CD deformations calls to be consistent with main B changes
Update tauxbx, tauxby calculations on C/CD to be consistent with main B changes

* Update OpenMP in C/CD implementation
Extend omp_suite to include C/CD tests

* reconcile recent merge problem

* set default value of capping to 0. in vp cases for backwards compatibility

* Set capping to 1.0 in vp consistent with evp, changes answers for vp configurations

Co-authored-by: David A. Bailey <[email protected]>
Co-authored-by: Philippe Blain <[email protected]>
Co-authored-by: Denise Worthen <[email protected]>
Co-authored-by: daveh150 <[email protected]>
Co-authored-by: David A. Hebert <[email protected]>
Co-authored-by: Mariana Vertenstein <[email protected]>
Co-authored-by: Elizabeth Hunke <[email protected]>
Co-authored-by: TRasmussen <[email protected]>
@apcraig apcraig deleted the parsefix branch August 17, 2022 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants