-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix modulefiles for Hera/Rocky8 OS. #2194
Conversation
This should be tested on Rocky login nodes! hfe09-hfe12. |
@RatkoVasic-NOAA With this module update, we can only run UFS weather model code on hfe09-hfe12, can we run the model on other hera nodes when your PR is committed? Thanks |
@junwang-noaa , no. You cannot use both. But, you can use old modulefiles ( |
@RatkoVasic-NOAA can you continue to sync up branch? we may need to schedule this pr tomorrow. |
Done. |
The It is here: HERA: Big long backtrace
|
The cpld_control_p8_gnu and cpld_debug_p8_gnu both fail with this message:
|
@SamuelTrahanNOAA all tests pass on my side. @zach1221 @FernandoAndrade-NOAA can you test gnu cases on hera/rocky8? |
Did your tests pass on the first try or did you have to rerun them? |
It passed with the first try. A few other people are running gnu cases now. We can confirm. |
I am also receiving this error on Hera, for cpld_control_p8_gnu and cpld_debug_p8_gnu. |
@RatkoVasic-NOAA it sounds like different result with case-by-case. Some nodes still heterogeneous? openmpi or gcc version issue? |
@zach1221 - Can you reproduce the error I saw with control_wam_debug_gnu? It may have been caused by the job being sent to the wrong service (login) node. |
I am not sure if we are triggering -mcmodel=medium on hera/gnu. |
For ecflow, if used: I have a feeling maybe the ECF_HOST env var on hera isn't set properly with this transition? I logged into a rocky8 node and 'module load ecflow' and 'printenv | grep ECF' and only the _ROOT env var showed up. Try manually setting the hera ecflow ECF_HOST var to (i think) hfe12 and see if that helps? (if needed) |
With my first test, where cpld_control_p8_gnu and cpld_debug_p8_gnu failed, control_wam_debug_gnu actually passed. I'm retesting now with some changes to cmake.gnu . |
I used Rocoto and saw those bugs. That means the problem is not specific to ecFlow. |
@climbfuji I am not sure about OSC pt2pt issue. I vaguely remember a similar issue was seen with openmpi on Hercules. Do you remember? @RatkoVasic-NOAA @ulmononian Any comment? |
@jkbk2004 I'm looking into this right now. I haven't seen this error message before. |
@jkbk2004 forcing to run on nodes 5-12 didnt work, failing with same OSC pt2pt error. The GNU.cmake update test timed out, so running it again with manually extended time. Update: the cpld_control_p8_gnu test failed with same error after adding -mcmodel=large & medium to gnu.cmake. |
As @climbfuji explained, I'm not going to start working on GNU 13 until all packages are working with this version. Though, I will try on my personal space in the meantime. |
Do you have to use OpenMPI for this? Can't you use an MPICH derivative instead? |
If you want to use [email protected] then you can't use mpich@4 - don't remember when the bug fix in mapl was merged that allows using mpich@4 |
Gnu 12.2 is fine with me. I wasn't sure how complicated it would make things on hera. Important to get this started ASAP. |
[email protected] works with mpich@4 - https://github.com/GEOS-ESM/MAPL/releases/tag/v2.42.0 |
We don't have 12.2 on Hera/Rocky. Only 9.2.0 and 13.2.0 (for now). |
How long would it take their SA to install 12.2? Would it be easier to wait for that over trying to get 13.2 working with spack stack? |
That is good question, meaning: I don't know the answer ;-) I will first try with 13.2.0 and see what we need only for WM. |
@RatkoVasic-NOAA It won't work since mapl doesn't work with 13.2 and you need that for the UFSWM. The last change for gnu@13 for mapl was apparently merged last week, there isn't even a release yet - GEOS-ESM/MAPL#2640. - EDIT this was for mapl@3. I don't know which tag if any of mapl@2 works with gnu@13 - @mathomp4 probably knows. |
At the moment no official release of MAPL 2 works with GCC 13. But, MAPL That said, if needed we could release MAPL 2.45 with those fixes...but not that at the moment MAPL 2.44+ doesn't build in spack. That is due to the Footnotes
|
I didn't think we needed to run all systems. No code touches any other system. I thought Hercules was a special case because of the changes to the cpld tests. |
Ok, I was just being safe. We don't have to finish jet and wcoss2/acorn if you don't think it's necessary. But yes, you're correct, only hercules/hera had changes. |
@BrianCurtis-NOAA @DeniseWorthen @jkbk2004 testing is complete. Feel free to provide final review. |
@aerorahul We moved to rocky8. FYI: we will revisit about the gnu/openmpi issue on rocky8. |
Commit Queue Requirements:
Description:
Hera is switching to new OS. This is update to enable ufs-weather-model to run on Rocky8 OS.
Necessary changes are made to spack-stack libraries.
NOTE! Since different version of openmpi is used, results change when using GNU compiler.
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Library changes are included in this PR (spack-stack).
Testing Log: