-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S2SWA app crashes on xjet when initializing CICE #1262
Comments
@DavidHuber-NOAA We've been running low resolution cpld tests on xjet and haven't seen any issues. What resolution is your run? Can you run the cpld_control_p8 test on xjet to confirm the issue? Thanks |
@junwang-noaa I'm running at C384 resolution. Sure, I will run the regression test, though I expect it to pass since the executable will also be built on xjet. |
@junwang-noaa The regression test passed. As an additional test, I'm going to try running the C384 forecast out 6 hours after recompiling with the |
@junwang-noaa The C384 forecast successfully ran without the |
I reran the regression test, compiling on kjet then running cpld_control_p8 on xjet. This caused the same crash I'm seeing when compiled on the head node then running on xjet. The cpld_control_p8 test directory is located here: /lfs1/NESDIS/nesdis-rdo2/David.Huber/RT_RUNDIRS/David.Huber/FV3_RT/rt_200684/cpld_control_p8. |
@DavidHuber-NOAA I am not sure what is the difference between kjet and xjet, have you run any tests on xjet before with the executable compiled on kjet? |
@junwang-noaa xjet's CPU architecture is Haswell, while kjet's is Skylake. The CPUs support different instruction sets (e.g. AVX-2 on xjet and AVX-512 on kjet). Thus, when CICE is compiled on kjet with -xHOST, it will compile with newer instructions than xjet can perform. I have run tests on xjet before (not RTs, just forecasts), but in ATM-only and ATMW modes. These ran successfully. If extra instructions are required for CICE, then I would suggest using the instructions in |
* Remove -xHOST from CICE CMakeLists.txt. #1262 Co-authored-by: [David Huber] <[[email protected]]> Co-authored-by: JONG KIM <[email protected]> Co-authored-by: Brian Curtis <[email protected]>
Description
The model crashes on xjet at line 1146 of ice_init.F90 with an "illegal instruction" error when initializing the CICE component. This is verified for the S2SWA app, but is likely also true for any S2S* app.
I believe I have tracked it down to the CMakeLists.txt in CICE-interface, which appends
-xHOST
toCMAKE_Fortran_FLAGS
on line 10.To Reproduce:
What compilers/machines are you seeing this with? Intel 18.0.5.274 and IMPI 2018.4.274
Give explicit steps to reproduce the behavior.
Note, I have not tested this on any other partition.
Additional context
The regression tests also run on xjet, but they are also compiled there.
Output
output logs
The complete log is attached with a snippet of the crash itself shown below.
gfsfcst.log
The text was updated successfully, but these errors were encountered: