Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs hang when retry is enabled and step fails on at least one node #2442

Closed
al-heisner opened this issue Apr 19, 2017 · 0 comments
Closed
Assignees
Labels
Milestone

Comments

@al-heisner
Copy link

Issue type: Bug report

  • Rundeck version: 2.8.1-1
  • install type: RPM install
  • OS Name/version: RHEL 6.9
  • DB Type/version: MySQL

Expected Behavior
On step failure, job is retried up to {option.retry} times

Actual Behavior
On step failure, job hangs in running status and never retried. Log files show something like:
2017-04-18 15:36:08,890 [quartzScheduler_Worker-7] ERROR grails.app.services.rundeck.services.ExecutionUtilService - Execution failed: 176985 in project Test_project: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [failnode: NonZeroResultCode: Remote command failed with exit status 1]}, Node failures: {failnode=[NonZeroResultCode: Remote command failed with exit status 1]}, status: failed] 2017-04-18 15:36:08,923 [quartzScheduler_Worker-7] INFO grails.app.services.rundeck.services.ScheduledExecutionService - scheduling immediate job run: 42:test_hang_job 2017-04-18 15:36:08,927 [quartzScheduler_Worker-7] ERROR grails.app.jobs.rundeck.quartzjobs.ExecutionJob - Execution 176985 save result status: caught exception: caught exception while adding job: Unable to store Job : 'Test_project:test_hang_job:scheduled/test.42:test_hang_job', because one already exists with this identification.

How to reproduce Behavior
set up a job with:
Workflow: if a step fails - 'stop at failed step'
Strategy: Node First
Workflow step: Execute inline script
inline script is intended to fail on one of the nodes, ex.
if [ "$(uname -n)" == "failnode" ]; then exit 1; fi.
Nodes: Dispatch to nodes
Node filter is set to the one that will fail plus some that won't
Thread count: 1
If a node fails: Continue running on any remaining nodes before failing the step.
Schedule to run repeatedly: Yes, crontab, every minute
Enable Scheduling: yes
Enable Execution: yes
Multiple Executions: no
Retry: 2

When the job fails, it attempts to schedule immediate (the retry?) and then fails to save result status, thus leaving the job unfinished state in DB.

@gschueler gschueler added the bug label Apr 19, 2017
@gschueler gschueler self-assigned this Apr 19, 2017
gschueler added a commit that referenced this issue Apr 19, 2017
fix #2442 retry execution quartz ident should differ from primary sch…
@gschueler gschueler added this to the 2.8.2 milestone Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants