Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic restart after "high wind speed" warning does not work #12

Open
christian-stepanek opened this issue Jun 19, 2020 · 1 comment

Comments

@christian-stepanek
Copy link

Expected behavior: After the esm-tools detect a model instability, decide to kill the current simulation, and resubmit the new simulation with the ENSTDIF-workaround for one year (see the message from the simulation logfile below), the simulation should continue with a new job-ID and the ENSTDIF-workaround being active for one year.

ERROR: high wind speed was found during your run, applying wind speed fix and resubmitting...
Will kill the run now...
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: got SIGCONT
slurmstepd: error: *** JOB 6235540 ON prod-0040 CANCELLED AT 2020-06-18T19:53:29 ***
srun: forcing job termination
  0: slurmstepd: error: *** STEP 6235540.0 ON prod-0040 CANCELLED AT 2020-06-18T19:53:29 ***
srun: error: prod-0045: tasks 180-215: Exited with exit code 1
srun: Terminating job step 6235540.0

Observed behavior: The simulation run, during which high windspeed is detected in the atmout file, is actually killed, but no re-submission occurs (or, if re-submission occurs via the tools, the attempt is not successful).

Suggested solution: Fix or implement the esm-tools code that, after killing the current simulation, indeed resubmits the simulation (with ENSTDIF-workaround for exactly one year only).

employed run script (on Ollie):
/home/ollie/stepanek/esm-tools_v4/esm_tools/myrunscripts/myinitialtest_yearly_old.yaml

employed esm_tools versions:
esm_archiving : unknown version!
esm_autotests : unknown version!
esm_calendar : 4.0.1
esm_database : 4.0.0
esm_environment : 4.0.1
esm_master : 4.0.2
esm_parser : 4.0.2
esm_profile : 4.0.0
esm_rcfile : 4.0.0
esm_runscripts : 4.0.3
esm_tools : 4.0.9
esm_plugin_manager : 4.0.1
esm_version_checker : 4.0.2

@pgierz
Copy link
Member

pgierz commented Jun 19, 2020

Hi,

an important question to be clarified here would be: do we want to automatically resubmit the job? I'm happy to program that, maybe with the option to turn the automatic resubmit off.

I'd ask one of the other developers (or @christian-stepanek) to have another look at the code to ensure it does the right thing and is understandable. Given that it requires a change to esm-runscripts, I'm moving the issue there.

@pgierz pgierz transferred this issue from esm-tools/esm_tools Jun 19, 2020
@pgierz pgierz mentioned this issue Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants