-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile errors on orion and hera with develop #1450
Comments
@jkbk2004 I have retested on both hera and orion and am still having this same issue. |
Can confirm the same error on Orion when I call tests/compile.sh directly to attempt build for global-workflow. |
@JessicaMeixner-NOAA I reset the permission of the whole hpc-stack directory on orion. Can you give a try? |
Hi @jkbk2004, @JessicaMeixner-NOAA is away until Wednesday. I can test on orion and let you know the outcome. |
@jkbk2004 I just tested on orion and found I get the same error. |
@MatthewMasarik-NOAA @BrianCurtis-NOAA @ChunxiZhang-NOAA @zach1221 can you take a look: /work/noaa/epic-ps/jongkim/4debug? As err.log shows, I am able to load modules ok: crtm. Can you give a try to run the jobs_card I put there? so that we can catch if module loading is ok with everyone. |
@jkbk2004 I copied that directory and submitted the job_card. Here is the output of err.log (out.log is empty): [matma@Orion-login-1 4debug]$ cat err.log
++ date +%s
+ echo -n ' 1665579328,'
+ set +x
Lmod has detected the following error: The following module(s) are unknown:
"ufs_common"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore-cache load "ufs_common"
Also make sure that all modulefiles written in TCL start with the string
#%Module
Ps, I don't have |
make sure you |
Is this message for me, @BrianCurtis-NOAA? If I try that, |
@BrianCurtis-NOAA @jkbk2004 @MatthewMasarik-NOAA I am back from leave and can try this again today. Brian I had a question about the module load you said we should do because I don't have to do this for other machines and I've never had to do this for orion before. |
@JessicaMeixner-NOAA @MatthewMasarik-NOAA @jkbk2004 Sorry for the confusion. I had an issue on orion that I found when testing a build on Orion with develop branch. I had to load git/2.28.0 before git would pull everything cleanly without error. More context for the module use and module load, here's how I setup my env for running RT.
on Orion, at least, a |
I've tried loading the git module last week and that did not solve my issue either. I've never had to load the ufs modules for any other machine... |
rt.sh should automatically do it, yes. |
I was able to run on orion this morning with the latest ufs version. I'll try hera now. |
Compile 11 on hera is still failing for me |
Same thing for me, it fails in compiling with DEBUG option on Hera.
|
I tried again today and the compile 011 is still failing for me on hera. I have been okay on orion and was going to test that again but there are /work issues. |
@JessicaMeixner-NOAA One option is to try to reduce the number of ccpp SDFs in the FV3/ccpp/suites directory to only those actually used by the regression test. This test that is failing does not explicitly list suites, so it tries to build them all. Currently there are more than 90 suite definitions there. Not all of them are used, we use (regression test) only about a third. |
I just ran with 90 (out of 91 SDFs) and it worked. I just excluded first one on the list (suite_FV3_CPT_v0.xml). Still have no idea why this is happening, and only to few of us. |
suite_FV3_CPT_v0.xml is a deprecated SDF. To reduce the number of SDFs in the suites directory is a good option. And could make it happen soon. |
@DusanJovic-NOAA - running with your script first, the regression tests succeeded. I'm with @RatkoVasic-NOAA on the wondering why this is happening to a few of us. |
git/28 module requirement (on orion) is case-by-case. If there is git clone issue, the problem is resolved clearly with new version. I am closing this issue. If the issue is persistent, we can re-open the issue. |
Description
When running ufs-weather-model develop branch (hash, e6da626) I get a failure for most of the comile jobs on orion (@pjpegion and others have gotten similar errors) and for compile 011 on hera (@MatthewMasarik-NOAA gets the same errror).
To Reproduce:
Check out the develop branch, run ./rt.sh -e (from ecflow server on hera).
Additional context
I know that the orion develop worked for me last week. I have not tried to back-track versions yet as I'm curious if this is a larger issue.
Output
Orion:
Code on orion is here: /work2/noaa/marine/jmeixner/ufs-develop/tests
rt dir: /work2/noaa/marine/jmeixner/stmp/jmeixner/FV3_RT/rt_445868
Main error is not being able to find crtm:
Hera:
code: /scratch1/NCEPDEV/climate/Jessica.Meixner/ufs-weather-model/tests
rt dir: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_RT/rt_5239
Main error:
The text was updated successfully, but these errors were encountered: