Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade library suite to spack-stack on non-production machines #859

Closed
DavidHuber-NOAA opened this issue Oct 17, 2023 · 26 comments
Closed
Assignees
Labels
maintenance Basic upkeep

Comments

@DavidHuber-NOAA
Copy link
Collaborator

The UFS_Utls repository should be upgraded to use spack-stack libraries where available (all tier-1 and tier-2 systems except WCOSS2). This is in support of the global-workflow (NOAA-EMC/global-workflow#1868). To start, a test suite of libraries have been installed on Hera including ip/4.0.0 and gsi-ncdiag/1.1.2.

@GeorgeGayno-NOAA
Copy link
Collaborator

@DavidHuber-NOAA how do you load these new libraries?

@DavidHuber-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA Two paths are needed:

prepend_path("MODULEPATH", "/scratch1/NCEPDEV/nems/Alexander.Richert/spack-stack-1.4.1-gw/envs/gw/install/modulefiles/Core")
prepend_path("MODULEPATH", "/scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles")

Next, load stack-intel, stack-intel-oneapi-mpi, then the individual libraries.

Note also that some of the library names have changed from hpc-stack. Most notably, netCDF has been split into netcdf-c/4.9.2 and netcdf-fortran/4.6.0. I don't believe that any other ufs_utils libraries are affected by the name changes.

@aerorahul
Copy link
Contributor

@GeorgeGayno-NOAA
The ufs-weather-model now uses spack-stack.
The best example of how to use spack-stack can be seen in the ufs-weather-model modulefiles.

@GeorgeGayno-NOAA
Copy link
Collaborator

Is there a deadline for this upgrade?

@aerorahul
Copy link
Contributor

This is holding up the migration of global-workflow to spack-stack.

@DavidHuber-NOAA
Copy link
Collaborator Author

I ran all regression and unit tests on Hera after building with spack-stack. All tests completed successfully. The UFS is currently going through an upgrade to use spack-stack/1.5.0 (ufs-community/ufs-weather-model#1920). I am going to start running tests on Hera and Orion against the same spack-stack builds.

@DavidHuber-NOAA
Copy link
Collaborator Author

DavidHuber-NOAA commented Oct 25, 2023

@GeorgeGayno-NOAA I completed unit tests on both Hera and Orion successfully as well as regression tests on Hera. RTs on Orion timed out yesterday and the system is down for maintenance today, so I will spin those up again tomorrow.

I also updated the ci/spack.yaml file to use the updated versions of various libraries. However, I am encountering a GitHub CI build error in sorc/orog_mask_tools.fd/orog.fd/mtnlm7_oclsm.F:3826 when attempting to call ipolates. I am using the same version of ip (4.3.0) in CI that is used in the modules and the GitHub build log shows ipolates.f90 was compiled and placed in both the include_d and include_4 directory, so I am not sure why it is failing.

@AlexanderRichert-NOAA Is there a special variant I need to compile ip with here? Or am I missing something in ci/spack.yaml?

@GeorgeGayno-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA I completed unit tests on both Hera and Orion successfully as well as regression tests on Hera. RTs on Orion timed out yesterday and the system is down for maintenance today, so I will spin those up again tomorrow.

I also updated the ci/spack.yaml file to use the updated versions of various libraries. However, I am encountering a GitHub CI build error in sorc/orog_mask_tools.fd/orog.fd/mtnlm7_oclsm.F:3826 when attempting to call ipolates. I am using the same version of ip (4.3.0) in CI that is used in the modules and the GitHub build log shows ipolates.f90 was compiled and placed in both the include_d and include_4 directory, so I am not sure why it is failing.

Routine makeoa3 is where ipolates is called. And that routine is one of the many 'dead' routines in that module. If you have a lot of trouble getting makeoa3 to compile, we can always remove it.

@AlexanderRichert-NOAA Is there a special variant I need to compile ip with here? Or am I missing something in ci/spack.yaml?

@DavidHuber-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA It looks like the sorc/emcsfc_snow2mdl.fd/snow2mdl.F90:interp function also calls ipolates a few times and interp is called by the driver. Could that subroutine also be removed if need be?

@GeorgeGayno-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA It looks like the sorc/emcsfc_snow2mdl.fd/snow2mdl.F90:interp function also calls ipolates a few times and interp is called by the driver. Could that subroutine also be removed if need be?

Unfortunately, not.

@AlexanderRichert-NOAA
Copy link
Collaborator

I'm looking at this now. It's not obvious why it's failing given that "-DIP_V4" is set in the compiler command line and the code has "use ipolates_mod" in what appears to be the correct place. I don't think it's a build option issue for ip but I'm also not sure what's going on. I'll keep you posted.

@AlexanderRichert-NOAA
Copy link
Collaborator

I can reproduce the issue on my personal computer, and I seem to be able to resolve it by removing -I/dev/shm/spack-uu/var/spack/environments/test/.spack-env/view/include_4 from the compile line. Ostensibly, it's finding the ipolates_mod.mod in that directory and trying to use it, but can't because it's the wrong data type. Might be a compiler bug since ostensibly it should keep searching for matching routines. In any case, probably the easiest thing would be to set the 'precision=d' variant for ip in spack.yaml since that's the only version of ip used by ufs_utils.

@DavidHuber-NOAA
Copy link
Collaborator Author

Great, thanks for hunting that down, @AlexanderRichert-NOAA! It's interesting that it isn't an issue on Hera or Orion, but I agree that one way or the other UFS_utils should only point at include_d for ip.

@AlexanderRichert-NOAA
Copy link
Collaborator

For posterity: It has to do with the use of the spack view in the CI, where bacio uses the _4 version, so it puts the include_4 path in there, then everything else is _d, so you end up with include_4 and include_d, and since ip defaults to building both 4 and d, there's a copy of the mod files in each of those two directories. On other systems where there's no spack view, there separate include flags for each package, i.e., the compiler command contains -I/path/to/bacio -I/path/to/ip etc.

@DavidHuber-NOAA
Copy link
Collaborator Author

Ah, good to know. Thanks for the extra explanation!

DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 25, 2023
DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 25, 2023
@DavidHuber-NOAA
Copy link
Collaborator Author

DavidHuber-NOAA commented Oct 26, 2023

@AlexanderRichert-NOAA I tried building the spack environment with the variant precision=d. While the environment built successfully, the UFS build is now failing at the cmake step as cmake finds sp_4 which is reliant on ip_4, which wasn't built. Unfortunately, there isn't a variant to build only sp_d, so I think ip_4 has to also be built. I'm not really sure what to try from here. Seems a bit drastic, but does the module need to be renamed inside of ip from ipolates_mod to ipolates_mod_4 and ipolates_mod_d?

@AlexanderRichert-NOAA
Copy link
Collaborator

I was toying with the that idea, but was hoping to avoid it at least for now... How difficult would it be to disable the 'view' in the spack environment and load the packages so that there are separate -I flags for each package? Alternatively, I just recently added a fix for that issue where ip can't be found in cmake if libip_4 wasn't built, so I could do a release that includes that and update the Spack recipe. Since the CI is using JCSDA spack that would be pretty quick.

@DavidHuber-NOAA
Copy link
Collaborator Author

Well, I could change around the head CMakeLists.txt to call find_package to look for specific library/include directories. Is that what you mean? If I don't load the entire ufs_utils spack environment, then I assume I would be doing spack loads instead, but I don't see any documentation on how to specify an include path (-I) when running spack load. Do you know how to do that?

As for the spack fix to handle the case where libip_4 isn't built, that sounds like a solid path forward and probably less hack-ish.

@AlexanderRichert-NOAA
Copy link
Collaborator

AlexanderRichert-NOAA commented Oct 26, 2023

I think it will work if you set the the 'view' setting in spack.yaml to false, then do 'spack load ip sp ...rest of dependencies...' after the 'spack env activate' command in the ufs_utils build step.

Meanwhile, I'll work on an ip release+recipe update.

DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 26, 2023
DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 26, 2023
@DavidHuber-NOAA
Copy link
Collaborator Author

Unfortunately setting view: false did not work. I am still getting the same error during the build step cmake cannot find ip_4.

@AlexanderRichert-NOAA
Copy link
Collaborator

Sorry, I forgot to add you would then want to take away the preicison=d setting so it will go back to building ip_4, which is what the buggy cmake config is looking for in order to find the library.

DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 26, 2023
@DavidHuber-NOAA
Copy link
Collaborator Author

DavidHuber-NOAA commented Oct 26, 2023

Shucks, I thought we had it. The build was successful, but the ctests are failed due to missing libraries (e.g. libpioc.so). I may need to load additional spack modules. I'll try it out again tomorrow.

@DavidHuber-NOAA
Copy link
Collaborator Author

Hmm, libpioc is the parallelio library and I am loading that at ctest time, so it seems that something isn't right here. This only seems to be the issue for the intel tests.

For the gcc tests, it is failing due to illegal instructions, which makes me think that the gcc compilers are not being loaded correctly, though it does indicate that parallelio is loaded correctly, otherwise the executables would not have started. I will try adding gcc to the list of loads and see where that gets me.

DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 27, 2023
DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 27, 2023
@AlexanderRichert-NOAA
Copy link
Collaborator

@DavidHuber-NOAA I just released ip 4.4.0, which should let you use precision=d without then breaking the ufs_utils cmake... Further down the road hopefully we'll resolve this by rolling everything into one library with multiple interfaces.

@DavidHuber-NOAA
Copy link
Collaborator Author

Thanks @AlexanderRichert-NOAA I will give it a try!

@DavidHuber-NOAA
Copy link
Collaborator Author

It worked! Thanks @AlexanderRichert-NOAA.

DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Oct 31, 2023
DavidHuber-NOAA added a commit to DavidHuber-NOAA/UFS_UTILS that referenced this issue Nov 1, 2023
Note that the chgres RTs violate the debug QOS policy, so I changed the
QOS to batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Basic upkeep
Projects
None yet
Development

No branches or pull requests

4 participants