Tests that are running or have hung are reported as PEND #610

billsacks · 2016-09-29T20:26:30Z

It appears that the status for the run phase is listed as PEND until the run completes. This is a departure from cime4, in which a run was given the status of RUN once it started running. I prefer the cime4 behavior: It's helpful to see which tests are truly pending in the queue and which are running. This is particularly helpful for tests that have exited due to hanging and running out of wallclock time: such tests currently have a final status of PEND (rather than RUN in cime4).

I have a nagging feeling that this was discussed at some point, but I can't remember the details.... There may have been an argument about not introducing more status codes, but I'd personally prefer to have one more status code that prevents this misleading PEND status.

An example test that currently hangs in CESM is SMS_D_Ld1_P24x1.f10_f10.ICRUCLM45.hobart_nag.clm-af_bias_v5

(See also #383 for some initial discussion of this issue.)

The text was updated successfully, but these errors were encountered:

billsacks · 2016-09-29T20:29:02Z

cc @ekluzek

jgfouca · 2016-11-02T19:32:38Z

@billsacks A test should never be left in the PEND state if it's not running. A test that gets killed due to a hang should ideally be left in the FAIL state. I'm not an expert in batch systems... when a job exceeds its allocated time, what does the batch system due? Does it hit the submitted script with a SIG_KILL?

jedwards4b · 2016-11-02T19:34:05Z

I think so - easy to test, just add --walltime 00:01

billsacks · 2016-11-02T19:38:15Z

I guess there are two somewhat-related issues here:

What state does a test have if it is currently running: currently it seems this state is PEND; I'd prefer something different, like RUN
What state does a test have if it dies due to hanging and running out of wallclock time. Ideally this would be labeled as FAIL, but I realize that may not be easy. I'm okay with this being kept at whatever status code is used for (1)... mostly this issue is about renaming that status to something like RUN rather than PEND for greater clarity and distinction from tests that are still pending.

jgfouca · 2016-11-02T19:57:21Z

According to this: http://slurm.schedmd.com/scancel.html
... at least for slurm (2) is very doable, we just need to handle SIGTERM.

I will also try to address (1) if it looks like it won't add too much complexity.

billsacks · 2016-11-02T20:03:57Z

Sounds good, thanks. If it turns out to be easier to address (2) than (1), then I'm fine with that. Or, to say it another way: Given these three possibilities:

a. Pending in the queue

b. Currently running

c. Job killed due to a hang, running out of wallclock time

I'd at least like (a) and (c) to be reported differently from each other. If all three can be reported differently from each other, then great - but if not, then I don't care much whether (b) is reported the same as (a) or (c).

billsacks mentioned this issue Sep 29, 2016

Some yellowstone tests that abort in model run do not exit #383

Closed

jgfouca self-assigned this Nov 2, 2016

jgfouca mentioned this issue Nov 2, 2016

Better handling of timeouts for case.test #755

Merged

ghost added the in progress label Nov 2, 2016

jedwards4b closed this as completed in #755 Nov 10, 2016

ghost removed the in progress label Nov 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests that are running or have hung are reported as PEND #610

Tests that are running or have hung are reported as PEND #610

billsacks commented Sep 29, 2016 •

edited

Loading

billsacks commented Sep 29, 2016

jgfouca commented Nov 2, 2016

jedwards4b commented Nov 2, 2016

billsacks commented Nov 2, 2016

jgfouca commented Nov 2, 2016

billsacks commented Nov 2, 2016

Tests that are running or have hung are reported as PEND #610

Tests that are running or have hung are reported as PEND #610

Comments

billsacks commented Sep 29, 2016 • edited Loading

billsacks commented Sep 29, 2016

jgfouca commented Nov 2, 2016

jedwards4b commented Nov 2, 2016

billsacks commented Nov 2, 2016

jgfouca commented Nov 2, 2016

billsacks commented Nov 2, 2016

billsacks commented Sep 29, 2016 •

edited

Loading