Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overzealous LockedFiles scheme #3225

Closed
jedwards4b opened this issue Sep 3, 2019 · 4 comments
Closed

Overzealous LockedFiles scheme #3225

jedwards4b opened this issue Sep 3, 2019 · 4 comments
Assignees

Comments

@jedwards4b
Copy link
Contributor

jedwards4b commented Sep 3, 2019

From cesm user Mike Mills:

The case below failed to submit the short-term archiver because I had changed the RESUBMIT value in the middle of a run. This seems unnecessary to me.

CASEROOT: /glade/work/mmills/case/f.e21.FWsc2000climo.ne30pg3_ne30pg3_mg17.c20190831

CIMEROOT: /glade/work/mmills/cesm/opt-se-cslam_c190831/cime

run command is mpiexec_mpt -p "%g:" -np 5400 omplace -tm open64 /glade/scratch/mmills/f.e21.FWsc2000climo.ne30pg3_ne30pg3_mg17.c20190831/bld/cesm.exe >> cesm.log.$LID 2>&1
check for resubmit
dout_s True
mach cheyenne
resubmit_num 4
ERROR: File /glade/work/mmills/case/f.e21.FWsc2000climo.ne30pg3_ne30pg3_mg17.c20190831/env_run.xml appears to have changed without a corresponding invalidation, modtimes 1567494930.62 != 1567526551.70

The build of a branch case also failed initially, I think because I changed other parameters in env_run.xml while building:

CASEROOT: /glade/work/mmills/case/f.e21.FWsc2000climo.ne30pg3_ne30pg3_mg17.c20190903

xmlchange RUN_TYPE=branch,RUN_REFCASE=f.e21.FWsc2000climo.ne30pg3_ne30pg3_mg17.c20190831,RUN_STARTDATE=0007-01-01,RUN_REFDATE=0007-01-01

ERROR: File /glade/work/mmills/case/f.e21.FWsc2000climo.ne30pg3_ne30pg3_mg17.c20190903/env_run.xml appears to have changed without a corresponding invalidation, modtimes 1567527975.62 != 1567528188.07

I then did a case.setup --reset and a clean build, and that worked.

Is there a reason for the scripts to be so picky?

@jgfouca
Copy link
Contributor

jgfouca commented Sep 3, 2019

@jedwards4b I will refer you to the discussion: #2161 , particularly comments near the bottom. It's not safe to xmlchange a value while CIME is running; any CIME calls to get_value will not see that change if a Case object was already opened. Also, any flushes could inadvertently override the xmlchange.

@jedwards4b
Copy link
Contributor Author

Would it be possible to relax a little? In both these cases the changes would not affect the current cime invocation and are only intended for the next one. When flushing, if the file on disk has changed can we merge and only have an error if a direct conflict is created?

@ekluzek
Copy link
Contributor

ekluzek commented Sep 3, 2019

Is it possible to have a class of xml variables that can change at any time? Run variables should be locked during run, but these are really post-run variables, so they can change at any time. And doesn't this also mean that if you try to run xmlchange while cime is running -- that it should die with an error? That would at least let you know you have to wait..

@jgfouca
Copy link
Contributor

jgfouca commented Sep 3, 2019

Would it be possible to relax a little?

Anything is possible, it just depends on how much complexity we want to add into CIME. We need to keep in mind that, the more complex we make CIME's database model, the more potential for subtle errors.

The current system is simple: the mod timestamp of a file is recorded when a GenericXML file is opened. When the file is closed/flushed, the mod timestamp is checked; if it's different than the initial one, an error is raised.

When it comes to allowing certain variables to be change during certain windows of execution, things get much much more complicated.

One thing that might be easier is if we further breakdown the env XML files in such a way that we know it is safe to change values in file X during phase Y. For example, if we had env_submit.xml, we would know it's safe to change env_submit.xml values during case_run.

run xmlchange while cime is running -- that it should die with an error

@billsacks proposed something similar in #2161

@jgfouca jgfouca closed this as completed in b0f8a99 Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants