You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have two machines:
nomad server (bootstrap = 1)
nomad client which runs docker tasks.
I was investigating why my tasks ended up dead after running for a while.
It boiled down to a connectivity issue.
When network connection between client and server fails, task group enters "lost" state.
When later client machine rediscovers nomad server, Nomad presumably tries to restart task group and fails with "Failed to create container: no such image"
docker log: (registry.lingualeo-funk.com/config-service:dev-301 is an image used by task)
Handler for POST /containers/dc1f44d9da8f37291b74c505467392668c9921d6fa73730fd88eaa7d2becb427/stop returned error: Container dc1f44d9da8f37291b74c505467392668c9921d6fa73730fd88eaa7d2becb427 is already stopped
Handler for GET /images/registry.lingualeo-funk.com/config-service:dev-301/json returned error: No such image: registry.lingualeo-funk.com/config-service:dev-301
Handler for POST /containers/cf40b4cc63b9bb3717ec9e0e31cb175e239bb3152bbe973fe2238c3f8d470239/stop returned error: Container cf40b4cc63b9bb3717ec9e0e31cb175e239bb3152bbe973fe2238c3f8d470239 is already stopped
Handler for POST /containers/create returned error: No such image: registry.lingualeo-funk.com/config-service:dev-301
Nomad status during connection failure:
Summary
Task Group Queued Starting Running Failed Complete Lost
config-service 1 0 0 0 0 1
Evaluations
ID Priority Triggered By Status Placement Failures
d2297b4c 50 node-update blocked N/A - In Progress
0a14de18 50 node-update complete true
5a916b1b 50 job-register complete false
Placement Failure
Task Group "config-service":
* No nodes were eligible for evaluation
* No nodes are available in datacenter "production"
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
38189d21 5a916b1b d203b13d config-service stop lost 01/11/17 22:43:15 UTC
Nomad status after connection is back again:
Summary
Task Group Queued Starting Running Failed Complete Lost
config-service 0 0 0 1 1 0
Evaluations
ID Priority Triggered By Status Placement Failures
af5ace80 50 node-update complete false
5e83a333 50 node-update complete false
d2297b4c 50 node-update complete false
0a14de18 50 node-update complete true
5a916b1b 50 job-register complete false
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
6649b0c8 d2297b4c d203b13d config-service run failed 01/11/17 22:48:18 UTC
38189d21 5a916b1b d203b13d config-service stop complete 01/11/17 22:43:15 UTC
PS. I don't quite get why Nomad client tries to restart task? Why not just leave it running?
The text was updated successfully, but these errors were encountered:
Hey I am going to close this since a lot of the docker issues around docker images were fixed in 0.5.2+. As for why it gets restarted, the server detects the client is gone and tries to place it on a new machine. You just so happen to have a small enough cluster that it replaced on the same node.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad v0.5.1
I have two machines:
nomad server (bootstrap = 1)
nomad client which runs docker tasks.
I was investigating why my tasks ended up dead after running for a while.
It boiled down to a connectivity issue.
When network connection between client and server fails, task group enters "lost" state.
When later client machine rediscovers nomad server, Nomad presumably tries to restart task group and fails with "Failed to create container: no such image"
docker log: (registry.lingualeo-funk.com/config-service:dev-301 is an image used by task)
Nomad status during connection failure:
Nomad status after connection is back again:
PS. I don't quite get why Nomad client tries to restart task? Why not just leave it running?
The text was updated successfully, but these errors were encountered: