Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_archive: Failure to archive intermediate dart - and maybe other - restart files #1853

Closed
billsacks opened this issue Aug 31, 2017 · 5 comments · Fixed by #1855
Closed

Comments

@billsacks
Copy link
Member

There is a subtle issue is the current short-term archiver (case_st_archive.py) that prevents archiving intermediate restart files for dart, and possible other components.

The problem is in _archive_process: The relevant parts of the code look like this (some irrelevant pieces deleted for brevity):

    for archive_entry in archive.get_entries():
        # determine compname and compclass
        compname, compclass = archive.get_entry_info(archive_entry)

        datenames = _get_datenames(case, last_date)
        for datename in datenames:

            # archive restarts
            histfiles_savein_rundir = _archive_restarts(case, archive, archive_entry,
                                                        compclass, compname,
                                                        datename, datename_is_last,
                                                        archive_file_fn)

The problem with this is that _get_datenames determines the list of dates based on what cpl restart files are present. If cpl is processed before other components, then those other components will only see the final restart files when they call _get_datenames, because all other cpl restart sets will have been moved.

I have only noticed this problem affecting dart restart sets, but in principle it seems that it could affect any components, depending on the ordering of the results from archive.get_entries()

I have a fix for this in an upcoming PR, where I have refactored _archive_process to have the date loop outside the component loop.

@billsacks
Copy link
Member Author

cc @mvertens @jedwards4b @bertinia

@gold2718
Copy link

@billsacks, Would this also prevent archiving of intermediate initial files? I have a report from a user that intermediate CAM initial files are not being archived (i.e., only the last initial file written is archived). Since CAM uses initial files for data assimilation, this might be relevant to this issue.

jedwards4b added a commit that referenced this issue Aug 31, 2017
Rewrite ERP with SystemTestsCompareTwo
Test suite: scripts_regression_tests.py on cheyenne
Also tests listed above
Test baseline: n/a
Test namelist changes: none
Test status: bit for bit

Fixes #1851
Fixes #1853
Addresses #1647 (rewrites ERP, not ERS)

User interface changes?: none

Update gh-pages html (Y/N)?: N

Code review: @JEdwards, @jgfouca, @bertinia
@ghost ghost removed the in progress label Aug 31, 2017
@billsacks
Copy link
Member Author

@billsacks, Would this also prevent archiving of intermediate initial files? I have a report from a user that intermediate CAM initial files are not being archived (i.e., only the last initial file written is archived). Since CAM uses initial files for data assimilation, this might be relevant to this issue.

Yes, I think the same mechanism is used for CAM initial files, so this issue would affect them (depending on the order in which entries were returned from archive.get_entries()). This should now be fixed on master, or if you want to test with a minimal set of changes, you can apply the minor diffs given by billsacks@02c1616

@jedwards4b
Copy link
Contributor

I am conducting this test now.

@jedwards4b
Copy link
Contributor

I tested IRT_Ld5.f19_f19_mg17.F2000_DEV.cheyenne_intel and added inithist='DAILY' to the user_nl_cam. cam.i files were copied to the archive directory at all the correct interim times.

jgfouca added a commit that referenced this issue Nov 7, 2017
These files were deleted on the ESMCI side

... but somehow remained on the ACME side and continue to cause
problems in the merges. Delete them manually with this commit.

Fixes #1853

[BFB]

* jgfouca/cime/remove_zombie_files:
  These files were deleted on the ESMCI side
jgfouca added a commit that referenced this issue Feb 23, 2018
These files were deleted on the ESMCI side

... but somehow remained on the ACME side and continue to cause
problems in the merges. Delete them manually with this commit.

Fixes #1853

[BFB]

* jgfouca/cime/remove_zombie_files:
  These files were deleted on the ESMCI side
jgfouca added a commit that referenced this issue Mar 13, 2018
These files were deleted on the ESMCI side

... but somehow remained on the ACME side and continue to cause
problems in the merges. Delete them manually with this commit.

Fixes #1853

[BFB]

* jgfouca/cime/remove_zombie_files:
  These files were deleted on the ESMCI side
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants