Jobs hang when retry is enabled and step fails on at least one node #2442

al-heisner · 2017-04-19T16:02:51Z

Issue type: Bug report

Rundeck version: 2.8.1-1
install type: RPM install
OS Name/version: RHEL 6.9
DB Type/version: MySQL

Expected Behavior
On step failure, job is retried up to {option.retry} times

Actual Behavior
On step failure, job hangs in running status and never retried. Log files show something like:
2017-04-18 15:36:08,890 [quartzScheduler_Worker-7] ERROR grails.app.services.rundeck.services.ExecutionUtilService - Execution failed: 176985 in project Test_project: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [failnode: NonZeroResultCode: Remote command failed with exit status 1]}, Node failures: {failnode=[NonZeroResultCode: Remote command failed with exit status 1]}, status: failed] 2017-04-18 15:36:08,923 [quartzScheduler_Worker-7] INFO grails.app.services.rundeck.services.ScheduledExecutionService - scheduling immediate job run: 42:test_hang_job 2017-04-18 15:36:08,927 [quartzScheduler_Worker-7] ERROR grails.app.jobs.rundeck.quartzjobs.ExecutionJob - Execution 176985 save result status: caught exception: caught exception while adding job: Unable to store Job : 'Test_project:test_hang_job:scheduled/test.42:test_hang_job', because one already exists with this identification.

How to reproduce Behavior
set up a job with:
Workflow: if a step fails - 'stop at failed step'
Strategy: Node First
Workflow step: Execute inline script
inline script is intended to fail on one of the nodes, ex.
if [ "$(uname -n)" == "failnode" ]; then exit 1; fi.
Nodes: Dispatch to nodes
Node filter is set to the one that will fail plus some that won't
Thread count: 1
If a node fails: Continue running on any remaining nodes before failing the step.
Schedule to run repeatedly: Yes, crontab, every minute
Enable Scheduling: yes
Enable Execution: yes
Multiple Executions: no
Retry: 2

When the job fails, it attempts to schedule immediate (the retry?) and then fails to save result status, thus leaving the job unfinished state in DB.

The text was updated successfully, but these errors were encountered:

fix #2442 retry execution quartz ident should differ from primary sch…

gschueler added the bug label Apr 19, 2017

gschueler self-assigned this Apr 19, 2017

gschueler added the in progress label Apr 19, 2017

gschueler closed this as completed in fa68a7e Apr 19, 2017

gschueler added a commit that referenced this issue Apr 19, 2017

Merge pull request #2446 from rundeck/issue/2442

9a22377

fix #2442 retry execution quartz ident should differ from primary sch…

gschueler removed the in progress label Apr 19, 2017

gschueler added this to the 2.8.2 milestone Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs hang when retry is enabled and step fails on at least one node #2442

Jobs hang when retry is enabled and step fails on at least one node #2442

al-heisner commented Apr 19, 2017

Jobs hang when retry is enabled and step fails on at least one node #2442

Jobs hang when retry is enabled and step fails on at least one node #2442

Comments

al-heisner commented Apr 19, 2017