0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

rlenk · 2017-07-17T21:42:21Z

Nomad version

Nomad v0.6.0-rc1 (5187fcf+CHANGES)

Operating system and Environment details

CentOS 7.3.1611
Docker 17.03.2-ce
Consul v0.8.5

Issue

Allocations become stuck in pending status when the Docker engine becomes unresponsive. Obviously, there is a serious issue with Docker here that needs to be resolved, and there is only so much that the client can do if the engine is unresponsive, but I would expect that the allocations would eventually be moved to another client rather than being stuck in pending forever. This issue may be related to #2747, but seems different enough (Docker vs. raw-exec) to warrant a separate issue.

Reproduction steps

An unresponsive Docker engine (perhaps stopped with SIGSTOP) and new allocations.

Nomad Client logs

The client logs show that the client eventually stops dequeuing allocations in Client.watchAllocations when the updates channel is full. The full stack traces after SIGQUIT are in nomad_docker_pending.txt

[centos@worker-4 ~]$ sudo journalctl -u nomad-client | egrep 'client: (updated|alloc)'
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.139774 [DEBUG] client: updated allocations at index 1050995 (total 35) (pulled 1) (filtered 34)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.140333 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 34)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.366602 [DEBUG] client: updated allocations at index 1051006 (total 35) (pulled 0) (filtered 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.366831 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.515661 [DEBUG] client: updated allocations at index 1051013 (total 36) (pulled 1) (filtered 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.515866 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.635824 [DEBUG] client: updated allocations at index 1051018 (total 36) (pulled 0) (filtered 36)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.639829 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 36)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.824673 [DEBUG] client: updated allocations at index 1051029 (total 36) (pulled 0) (filtered 36)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.824914 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 36)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.182523 [DEBUG] client: updated allocations at index 1051045 (total 37) (pulled 1) (filtered 36)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.182759 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 36)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.339127 [DEBUG] client: updated allocations at index 1051053 (total 38) (pulled 1) (filtered 37)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.382788 [DEBUG] client: updated allocations at index 1051055 (total 38) (pulled 1) (filtered 37)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.672752 [DEBUG] client: updated allocations at index 1051060 (total 38) (pulled 1) (filtered 37)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.805094 [DEBUG] client: updated allocations at index 1051061 (total 38) (pulled 1) (filtered 37)
Jul 17 15:19:00 worker-4 nomad[10708]: 2017/07/17 15:19:00.662197 [DEBUG] client: updated allocations at index 1051302 (total 39) (pulled 2) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.029428 [DEBUG] client: updated allocations at index 1051321 (total 40) (pulled 3) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.146288 [DEBUG] client: updated allocations at index 1051327 (total 41) (pulled 4) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.231980 [DEBUG] client: updated allocations at index 1051332 (total 42) (pulled 5) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.907574 [DEBUG] client: updated allocations at index 1051370 (total 43) (pulled 6) (filtered 37)

The text was updated successfully, but these errors were encountered:

dadgar · 2017-07-18T01:42:49Z

@rlenk I think they might be intimately related. We are pretty sure we have a fix for it so we will see if this comes back during rc2.

rlenk · 2017-07-19T00:05:46Z

@dadgar Thanks for the update. I've now seen "pending with no allocation" happen a few times where there wasn't an issue with the Docker engine. I believe we've solved the issue with the Docker timeouts, so we'll see if this continues and wait for RC2.

schmichael · 2017-07-19T17:02:35Z

Should be fixed by #2852 which will be released in 0.6rc2 soon. Thanks for the report!

github-actions · 2022-12-11T02:18:30Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added type/bug theme/client labels Jul 18, 2017

schmichael closed this as completed Jul 19, 2017

github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

rlenk commented Jul 17, 2017

dadgar commented Jul 18, 2017

rlenk commented Jul 19, 2017

schmichael commented Jul 19, 2017

github-actions bot commented Dec 11, 2022

0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

Comments

rlenk commented Jul 17, 2017

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Nomad Client logs

dadgar commented Jul 18, 2017

rlenk commented Jul 19, 2017

schmichael commented Jul 19, 2017

github-actions bot commented Dec 11, 2022