You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a task which depends on a template with "ChangeMode": "restart". Occasionally, in roughly 25% of the instances where the template is updated, the restart prompts a "close of nil channel" panic as the task_runner attempts to close the stopCollection channel.
This configuration was working prior to 0.5.3 -> 0.5.5 upgrade. Handling of the stopCollection channel within the task_runner run function was modified in commit 4826d84, but I'm not familiar enough with the code to isolate the bug.
Nomad Client logs
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) receiving dependency health.service(disque-worker@aws|passing)
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [INFO] (runner) initiating run
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) checking template 5be7920e5f2ab5d81942594895349c82
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) rendering "(dynamic)" => "/var/lib/nomad/alloc/72ab2d7a-25f3-1a39-2d01-17680f51dac5/disque-web/local/env"
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [INFO] (runner) rendered "(dynamic)" => "/var/lib/nomad/alloc/72ab2d7a-25f3-1a39-2d01-17680f51dac5/disque-web/local/env"
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) diffing and updating dependencies
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) health.service(disque@aws|passing) is still needed
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) health.service(disque-worker@aws|passing) is still needed
Mar 24 10:30:42 tauros nomad[9859]: 2017/03/24 10:30:42 [DEBUG] (runner) watching 2 dependencies
Mar 24 10:30:43 tauros nomad[9859]: 2017/03/24 10:30:43.378485 [DEBUG] client: restarting task disque-web for alloc "72ab2d7a-25f3-1a39-2d01-17680f51dac5": consul-template: template with change_mode restart re-rendered
Mar 24 10:30:43 tauros nomad[9859]: 2017/03/24 10:30:43.378526 [DEBUG] client: task being restarted: consul-template: template with change_mode restart re-rendered
Mar 24 10:30:43 tauros nomad[9859]: 2017/03/24 10:30:43.394881 [DEBUG] http: Request /v1/client/stats?region=us-east-1&wait=60000ms (219.138µs)
Mar 24 10:30:43 tauros nomad[9859]: 2017/03/24 10:30:43.686622 [DEBUG] client: updated allocations at index 371634 (total 13) (pulled 0) (filtered 13)
Mar 24 10:30:43 tauros nomad[9859]: 2017/03/24 10:30:43.687030 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 13)
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) receiving dependency health.service(disque@aws|passing)
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [INFO] (runner) initiating run
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) checking template 5be7920e5f2ab5d81942594895349c82
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) rendering "(dynamic)" => "/var/lib/nomad/alloc/72ab2d7a-25f3-1a39-2d01-17680f51dac5/disque-web/local/env"
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) diffing and updating dependencies
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) health.service(disque@aws|passing) is still needed
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) health.service(disque-worker@aws|passing) is still needed
Mar 24 10:30:47 tauros nomad[9859]: 2017/03/24 10:30:47 [DEBUG] (runner) watching 2 dependencies
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.011739 [DEBUG] client: restarting task disque-web for alloc "72ab2d7a-25f3-1a39-2d01-17680f51dac5": consul-template: template with change_mode restart re-rendered
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.113289 [DEBUG] http: Request /v1/agent/servers (168.134µs)
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.391545 [DEBUG] http: Request /v1/client/stats?region=us-east-1&wait=60000ms (98.091µs)
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.560422 [DEBUG] driver.docker: error collecting stats from container 296e2ad2e3204b1512ef7ce4edde0849d2b29b14b49fac1d092ed4c067603e61: io: read/write on closed pipe
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.560526 [INFO] driver.docker: stopped container 296e2ad2e3204b1512ef7ce4edde0849d2b29b14b49fac1d092ed4c067603e61
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48 [DEBUG] plugin: /usr/local/nomad-0.5.5/nomad: plugin process exited
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.608270 [INFO] client: Restarting task "disque-web" for alloc "72ab2d7a-25f3-1a39-2d01-17680f51dac5" in 0s
Mar 24 10:30:48 tauros nomad[9859]: 2017/03/24 10:30:48.608550 [DEBUG] client: task being restarted: consul-template: template with change_mode restart re-rendered
Mar 24 10:30:48 tauros nomad[9859]: panic: close of nil channel
Mar 24 10:30:48 tauros nomad[9859]: goroutine 153 [running]:
Mar 24 10:30:48 tauros nomad[9859]: github.com/hashicorp/nomad/client.(*TaskRunner).run(0xc4205fa580)
Mar 24 10:30:48 tauros nomad[9859]: /opt/gopath/src/github.com/hashicorp/nomad/client/task_runner.go:972 +0xa9e
Mar 24 10:30:48 tauros nomad[9859]: github.com/hashicorp/nomad/client.(*TaskRunner).Run(0xc4205fa580)
Mar 24 10:30:48 tauros nomad[9859]: /opt/gopath/src/github.com/hashicorp/nomad/client/task_runner.go:442 +0x556
Mar 24 10:30:48 tauros nomad[9859]: created by github.com/hashicorp/nomad/client.(*AllocRunner).RestoreState
Mar 24 10:30:48 tauros nomad[9859]: /opt/gopath/src/github.com/hashicorp/nomad/client/alloc_runner.go:190 +0x82f
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.5.5
Operating system and Environment details
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
Issue
I have a task which depends on a template with
"ChangeMode": "restart"
. Occasionally, in roughly 25% of the instances where the template is updated, the restart prompts a "close of nil channel" panic as the task_runner attempts to close thestopCollection
channel.This configuration was working prior to 0.5.3 -> 0.5.5 upgrade. Handling of the
stopCollection
channel within the task_runner run function was modified in commit 4826d84, but I'm not familiar enough with the code to isolate the bug.Nomad Client logs
Excerpts from Job file
The text was updated successfully, but these errors were encountered: