Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure in ocn or ice initialization on Mira and Cetus? #1106

Closed
worleyph opened this issue Oct 19, 2016 · 0 comments · Fixed by #1110
Closed

failure in ocn or ice initialization on Mira and Cetus? #1106

worleyph opened this issue Oct 19, 2016 · 0 comments · Fixed by #1110

Comments

@worleyph
Copy link
Contributor

worleyph commented Oct 19, 2016

A_WCYCL2000 ne120_oRRS15to5 jobs are failing on Mira and Cetus. @amametjanov has experienced similar issues, and indicates that these are new problems (i.e., that he was able to run successfully in the recent past).

Problem signature is a little fuzzy. For me, code dies after cpl.log outputs

 (component_init_cc:mct) : Initialize component ice

but the ice.log file is empty. The cesm.log file contains a few error messages of the form:

 ".../ocn_comp_mct.f90", line 717: 1525-142 The CLOSE statement on unit 99 cannot be completed because an errno value of 2 (No such file or directory) was received while closing the file.  The program will stop.

I added IOSTAT to the file open and close commands for all SCRATCH files in ocn_comp_mct.F, with debug statements, and caught the same number of error warnings, all on the close at line 717 as indicated above (3 processes out of 8192 in my run). IOSTAT should prevent these errors from killing the job, so perhaps something else is going on?

Note the the ocn.log does not indicate any problems, nor do any of the other non-empty log files.

I'll keep poking, but this appears to affect more than me (perhaps everyone?) I'll look at low res. runs as well. It could be a memory issue, but since Az has been seeing it as well, it probably is not.

Labelling this as critical, based on the assumption that no ACME fully coupled runs (high res?) work on Mira or Cetus at the moment. I'll change this if it is shown to be some sort of edge case problem.

jonbob added a commit that referenced this issue Dec 23, 2016
Keep MPAS SCRATCH files open during init, run and finalization

Previously, SCRATCH files were opened and closed in each init_mct, run_mct and
final_mct call. With large number of MPI ranks that may
overwhelm a file system. This will instead open per-process temporary
files once in init_mct call and close once in final_mct call, keeping them
open in each run_mct call.

Fixes #1106
[BFB]
jonbob added a commit that referenced this issue Jan 12, 2017
Keep MPAS SCRATCH files open during init, run and finalization

Previously, SCRATCH files were opened and closed in each init_mct, run_mct and
final_mct call. With large number of MPI ranks that may
overwhelm a file system. This will instead open per-process temporary
files once in init_mct call and close once in final_mct call, keeping them
open in each run_mct call.

Fixes #1106
[BFB]
rljacob pushed a commit that referenced this issue Feb 27, 2017
Keep MPAS SCRATCH files open during init, run and finalization

Previously, SCRATCH files were opened and closed in each init_mct, run_mct and
final_mct call. With large number of MPI ranks that may
overwhelm a file system. This will instead open per-process temporary
files once in init_mct call and close once in final_mct call, keeping them
open in each run_mct call.

Fixes #1106
[BFB]
rljacob pushed a commit that referenced this issue Apr 16, 2021
Keep MPAS SCRATCH files open during init, run and finalization

Previously, SCRATCH files were opened and closed in each init_mct, run_mct and
final_mct call. With large number of MPI ranks that may
overwhelm a file system. This will instead open per-process temporary
files once in init_mct call and close once in final_mct call, keeping them
open in each run_mct call.

Fixes #1106
[BFB]
rljacob pushed a commit that referenced this issue May 6, 2021
Keep MPAS SCRATCH files open during init, run and finalization

Previously, SCRATCH files were opened and closed in each init_mct, run_mct and
final_mct call. With large number of MPI ranks that may
overwhelm a file system. This will instead open per-process temporary
files once in init_mct call and close once in final_mct call, keeping them
open in each run_mct call.

Fixes #1106
[BFB]
glemieux added a commit to rgknox/E3SM that referenced this issue May 29, 2024
Setting fates_hist_dimlevel with a single line (i.e. = 2,2) fails during
ELMBuildNamelist.  Adjust the assignment to accomodate this.

This also removes hist_ndens to avoid issue E3SM-Project#1106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants