Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mlevy/mapping on geyser #2589

Merged
merged 7 commits into from
May 14, 2018
Merged

Mlevy/mapping on geyser #2589

merged 7 commits into from
May 14, 2018

Conversation

mnlevy1981
Copy link
Contributor

Updated the wrappers for the ESMF mapping tools to work on NCAR machines. Note that currently only the serial modules are working - I have an issue ticket in with CISL to get the ESMF tools running in parallel, at which point I'll submit another pull request.

I also removed all reference to yellowstone and jaguar from the mapping tools.

Test suite: None
Test baseline: N/A
Test namelist changes: N/A
Test status: I verified that I could use my scripts to generate mapping files on cheyenne (both login node and compute node), geyser, caldera, and pronghorn.

Fixes #2469

User interface changes?: None

Update gh-pages html (Y/N)?: N

Code review: none yet

1. create_ESMF_map.sh works in serial mode on all NCAR machines
2. No more references to yellowstone or jaguar in create_ESMF_map.sh
3. no more regridbatch.yellowstone.sh scripts
4. Intel's optimization report files (*.optrpt) are ignored by git
5. gen_domain and runoff_to_ocn Makefiles clean up optimization report files
   when user runs "make clean"

STILL TO DO:
* create_ESMF_map.sh does not run in parallel on any NCAR machines
Remove references to yellowstone, also clean up map_field README file.
module load nco
if [ $MACH == "cheyenne" ]; then
module load intel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this tool were to invoke the cime/tools/configure script maybe it wouldn't be restricted to only run on this limited set of machines? The configure script would provide a env_machine_specific file which should load a proper esmf environment if one is defined for that system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, but ran into two issues with it:

  1. on cheyenne ../../../configure --mpilib mpi-serial worked fine but specifying mpt did not:
$ ../../../configure --mpilib mpt
ERROR: Multiple matches

I get the same error with --mpilib openmpi and --mpilib mpich2. Is there another mandatory argument? Adding --macros-format and --compiler doesn't change the error message. (Just a note: --clean works, so this branch has your fix for that issue)

  1. on pronghorn, I needed to remove trilinos to avoid this error:
$ ../../../configure --mpilib mpi-serial
ERROR: module command /glade/apps/opt/lmod/lmod/libexec/lmod python load ncarenv/1.0 ncarbinlibs/1.1 perlmods gmake/4.1 python all-python-libs git intel/15.0.3 mkl/11.1.2 trilinos/11.10.2 esmf esmf-7.0.0-ncdfio-uni-O netcdf/4.3.3.1 ncarcompilers/1.0 cmake/3.0.2 all-python-libs failed with message:
Lmod Error: Cannot find Trilinos package built for intel version: 15.0.3

But then it looks like changing --mpilib causes different problems:

$ ../../../configure --mpilib mpich2
ERROR: module command /glade/apps/opt/lmod/lmod/libexec/lmod python load ncarenv/1.0 ncarbinlibs/1.1 perlmods gmake/4.1 python all-python-libs git intel/15.0.3 mkl/11.1.2 esmf esmf-7.0.0-defio-mpi-O netcdf-mpi/4.3.3.1 pnetcdf/1.6.1 ncarcompilers/1.0 cmake/3.0.2 all-python-libs failed with message:
Lmod Warning: Did not find: netcdf-mpi/4.3.3.1 pnetcdf/1.6.1

Try: "module spider netcdf-mpi/4.3.3.1 pnetcdf/1.6.1"

netcdf-mpi/4.5.0 looks to be the only module built with intel, except it doesn't actually work:

$ module purge
$ module load intel/15.0.3
$ module avail netcdf-mpi

------------------------------------------------------- /glade/apps/opt/modulefiles/ca/cdep/intel -------------------------------------------------------
   netcdf-mpi/4.5.0

------------------------------------------------------- /glade/apps/opt/modulefiles/pr/cdep/intel -------------------------------------------------------
   netcdf-mpi/4.5.0

------------------------------------------------------- /glade/apps/opt/modulefiles/ys/cdep/intel -------------------------------------------------------
   netcdf-mpi/4.5.0
$ ../../../configure --mpilib mpich2
ERROR: module command /glade/apps/opt/lmod/lmod/libexec/lmod python load ncarenv/1.0 ncarbinlibs/1.1 perlmods gmake/4.1 python all-python-libs git intel/15.0.3 mkl/11.1.2 esmf esmf-7.0.0-defio-mpi-O netcdf-mpi/4.5.0 ncarcompilers/1.0 cmake/3.0.2 all-python-libs failed with message:
Lmod Error: Cannot find netcdf-mpi-4.5.0 built for intel version: 15.0.3

So if I can remove trilinos from the yellowstone entry in config_machines.xml then I think I'm good to go with using --mpilib mpi-serial... but there are more issues to work out before CIME's configure lets us use parallel.

Also clean up a lot of scripts -- no longer need '--batch' option,
gen_cesm_maps.sh handles ESMF errors better, and check_maps.sh handles errors
better as well.

Needed to remove trilinos from yellowstone module load to avoid configure
error.
Fixes call to configure --clean
@mnlevy1981
Copy link
Contributor Author

I think this PR is as good as it's going to get without some input from CISL: I've gotten much closer to being able to run ESMF_RegridWeightGen on an interactive cheyenne node, but for CESM 2.0 I think we should force users to use --serial because I don't see this getting patched before the code freeze tomorrow night.

@mnlevy1981
Copy link
Contributor Author

actually, I still need to update the README files to include --serial... then this should will be as good as it's going to get.

@mnlevy1981
Copy link
Contributor Author

I know I've made a comment like this one about three times now, but after the last set of updates I'm really done with this PR. For real. Where things stand:

  1. Using the --serial flag to use non-MPI version of ESMF tools works on all CISL machines
  2. Running without the --serial flag will use MPI-compiled ESMF tools on cheyenne compute nodes
  3. Running without the --serial flag on the cheyenne login node will provide an error telling you to either grab a compute node or run in serial.

At some point I may open up a new issue ticket to try to get parallel support on caldera and geyser.

@jedwards4b jedwards4b merged commit 7cef855 into ESMCI:master May 14, 2018
@mnlevy1981 mnlevy1981 deleted the mlevy/mapping_on_geyser branch May 15, 2018 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants