Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpointer.cpl bug causes ctsm restart to fail #2920

Open
slevis-lmwg opened this issue Dec 31, 2024 · 4 comments · May be fixed by #2921
Open

rpointer.cpl bug causes ctsm restart to fail #2920

slevis-lmwg opened this issue Dec 31, 2024 · 4 comments · May be fixed by #2921
Assignees
Labels
bfb bit-for-bit bug something is working incorrectly

Comments

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Dec 31, 2024

General bug information

CTSM version you are using:
branch_tags/tmp-241219.n01.ctsm5.3.016
Would have been ctsm master tag ctsm5.3.017 if ctsm's master branch were not "locked" while we wait for a new cesm tag.

Does this bug cause significantly incorrect results in the model's science? [Yes / No]
I'm guessing not.

Configurations affected: [Fill this in if known.]
CONTINUE = .true.

Important details of your setup / configuration so we can reproduce the bug

Case /glade/u/home/slevis/cases_LMWG_dev/ctsm53017_f19_BNF_AD documented in NCAR/LMWG_dev#88.

I have worked around the problem, so the simulation is in progress again. Here's a complete sequence of events:

  1. I started this AD spinup as a cold start.
  2. STOP_N was 40 and RESUBMIT was 6.
  3. The model resubmitted successfully after year 40.
  4. The model failed to resubmit after year 80.
  5. I didn't find an error, so I tried to submit manually and immediately got
    ERROR: CONTINUE_RUN is true but this case does not appear to have restart files staged in /glade/derecho/scratch/slevis/ctsm53017_f19_BNF_AD/run rpointer.cpl
  6. I got past the error with
    mv rpointer.cpl.0081-01-01-00000 rpointer.cpl
    but the run now failed while looking for rpointer.cpl.0081-01-01-00000
  7. I got past both errors with
    cp rpointer.cpl.0081-01-01-00000 rpointer.cpl
    and the run is in progress now. I changed STOP_N to 210 hoping that I will not have to deal with this problem again in this simulation.
@slevis-lmwg slevis-lmwg added bug something is working incorrectly next this should get some attention in the next week or two. Normally each Thursday SE meeting. bfb bit-for-bit labels Dec 31, 2024
@ekluzek
Copy link
Collaborator

ekluzek commented Dec 31, 2024

@slevis-lmwg I saw Jim commit a change to cime that might cover this. Can you try with cime6.1.56 and see if that fixes it?

This is also something that our SSP tests might have caught. But we let them go through failing.

@jedwards4b
Copy link
Contributor

Please try with cime6.1.56 and let me know if it is still a problem.

@slevis-lmwg
Copy link
Contributor Author

cime6.1.56 took care of it, thanks.

@slevis-lmwg slevis-lmwg removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jan 3, 2025
@ekluzek
Copy link
Collaborator

ekluzek commented Jan 3, 2025

@slevis-lmwg let's leave this open until we have a tag on master that has the cime update in place.

@ekluzek ekluzek reopened this Jan 3, 2025
@ekluzek ekluzek added this to the cesm3_0_beta06 milestone Jan 3, 2025
@ekluzek ekluzek moved this from Todo to In Progress in LMWG: Near Term Priorities Jan 7, 2025
@samsrabin samsrabin added blocked: dependency Wait to work on this until dependency is resolved and removed blocked: dependency Wait to work on this until dependency is resolved labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit bug something is working incorrectly
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

4 participants