-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure in ocn or ice initialization on Mira and Cetus? #1106
Labels
Comments
jonbob
added a commit
that referenced
this issue
Dec 23, 2016
Keep MPAS SCRATCH files open during init, run and finalization Previously, SCRATCH files were opened and closed in each init_mct, run_mct and final_mct call. With large number of MPI ranks that may overwhelm a file system. This will instead open per-process temporary files once in init_mct call and close once in final_mct call, keeping them open in each run_mct call. Fixes #1106 [BFB]
jonbob
added a commit
that referenced
this issue
Jan 12, 2017
Keep MPAS SCRATCH files open during init, run and finalization Previously, SCRATCH files were opened and closed in each init_mct, run_mct and final_mct call. With large number of MPI ranks that may overwhelm a file system. This will instead open per-process temporary files once in init_mct call and close once in final_mct call, keeping them open in each run_mct call. Fixes #1106 [BFB]
rljacob
pushed a commit
that referenced
this issue
Feb 27, 2017
Keep MPAS SCRATCH files open during init, run and finalization Previously, SCRATCH files were opened and closed in each init_mct, run_mct and final_mct call. With large number of MPI ranks that may overwhelm a file system. This will instead open per-process temporary files once in init_mct call and close once in final_mct call, keeping them open in each run_mct call. Fixes #1106 [BFB]
rljacob
pushed a commit
that referenced
this issue
Apr 16, 2021
Keep MPAS SCRATCH files open during init, run and finalization Previously, SCRATCH files were opened and closed in each init_mct, run_mct and final_mct call. With large number of MPI ranks that may overwhelm a file system. This will instead open per-process temporary files once in init_mct call and close once in final_mct call, keeping them open in each run_mct call. Fixes #1106 [BFB]
rljacob
pushed a commit
that referenced
this issue
May 6, 2021
Keep MPAS SCRATCH files open during init, run and finalization Previously, SCRATCH files were opened and closed in each init_mct, run_mct and final_mct call. With large number of MPI ranks that may overwhelm a file system. This will instead open per-process temporary files once in init_mct call and close once in final_mct call, keeping them open in each run_mct call. Fixes #1106 [BFB]
glemieux
added a commit
to rgknox/E3SM
that referenced
this issue
May 29, 2024
Setting fates_hist_dimlevel with a single line (i.e. = 2,2) fails during ELMBuildNamelist. Adjust the assignment to accomodate this. This also removes hist_ndens to avoid issue E3SM-Project#1106
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A_WCYCL2000 ne120_oRRS15to5 jobs are failing on Mira and Cetus. @amametjanov has experienced similar issues, and indicates that these are new problems (i.e., that he was able to run successfully in the recent past).
Problem signature is a little fuzzy. For me, code dies after cpl.log outputs
but the ice.log file is empty. The cesm.log file contains a few error messages of the form:
I added IOSTAT to the file open and close commands for all SCRATCH files in ocn_comp_mct.F, with debug statements, and caught the same number of error warnings, all on the close at line 717 as indicated above (3 processes out of 8192 in my run). IOSTAT should prevent these errors from killing the job, so perhaps something else is going on?
Note the the ocn.log does not indicate any problems, nor do any of the other non-empty log files.
I'll keep poking, but this appears to affect more than me (perhaps everyone?) I'll look at low res. runs as well. It could be a memory issue, but since Az has been seeing it as well, it probably is not.
Labelling this as critical, based on the assumption that no ACME fully coupled runs (high res?) work on Mira or Cetus at the moment. I'll change this if it is shown to be some sort of edge case problem.
The text was updated successfully, but these errors were encountered: