Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

Closed
rlenk opened this issue Jul 17, 2017 · 4 comments
Closed

0.6.0-rc1: allocations stuck in pending with unresponsive Docker engine #2847

rlenk opened this issue Jul 17, 2017 · 4 comments

Comments

@rlenk
Copy link

rlenk commented Jul 17, 2017

Nomad version

Nomad v0.6.0-rc1 (5187fcf+CHANGES)

Operating system and Environment details

CentOS 7.3.1611
Docker 17.03.2-ce
Consul v0.8.5

Issue

Allocations become stuck in pending status when the Docker engine becomes unresponsive. Obviously, there is a serious issue with Docker here that needs to be resolved, and there is only so much that the client can do if the engine is unresponsive, but I would expect that the allocations would eventually be moved to another client rather than being stuck in pending forever. This issue may be related to #2747, but seems different enough (Docker vs. raw-exec) to warrant a separate issue.

Reproduction steps

An unresponsive Docker engine (perhaps stopped with SIGSTOP) and new allocations.

Nomad Client logs

The client logs show that the client eventually stops dequeuing allocations in Client.watchAllocations when the updates channel is full. The full stack traces after SIGQUIT are in nomad_docker_pending.txt

[centos@worker-4 ~]$ sudo journalctl -u nomad-client | egrep 'client: (updated|alloc)'
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.139774 [DEBUG] client: updated allocations at index 1050995 (total 35) (pulled 1) (filtered 34)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.140333 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 34)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.366602 [DEBUG] client: updated allocations at index 1051006 (total 35) (pulled 0) (filtered 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.366831 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.515661 [DEBUG] client: updated allocations at index 1051013 (total 36) (pulled 1) (filtered 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.515866 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 35)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.635824 [DEBUG] client: updated allocations at index 1051018 (total 36) (pulled 0) (filtered 36)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.639829 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 36)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.824673 [DEBUG] client: updated allocations at index 1051029 (total 36) (pulled 0) (filtered 36)
Jul 17 15:18:02 worker-4 nomad[10708]: 2017/07/17 15:18:02.824914 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 36)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.182523 [DEBUG] client: updated allocations at index 1051045 (total 37) (pulled 1) (filtered 36)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.182759 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 36)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.339127 [DEBUG] client: updated allocations at index 1051053 (total 38) (pulled 1) (filtered 37)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.382788 [DEBUG] client: updated allocations at index 1051055 (total 38) (pulled 1) (filtered 37)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.672752 [DEBUG] client: updated allocations at index 1051060 (total 38) (pulled 1) (filtered 37)
Jul 17 15:18:03 worker-4 nomad[10708]: 2017/07/17 15:18:03.805094 [DEBUG] client: updated allocations at index 1051061 (total 38) (pulled 1) (filtered 37)
Jul 17 15:19:00 worker-4 nomad[10708]: 2017/07/17 15:19:00.662197 [DEBUG] client: updated allocations at index 1051302 (total 39) (pulled 2) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.029428 [DEBUG] client: updated allocations at index 1051321 (total 40) (pulled 3) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.146288 [DEBUG] client: updated allocations at index 1051327 (total 41) (pulled 4) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.231980 [DEBUG] client: updated allocations at index 1051332 (total 42) (pulled 5) (filtered 37)
Jul 17 15:19:01 worker-4 nomad[10708]: 2017/07/17 15:19:01.907574 [DEBUG] client: updated allocations at index 1051370 (total 43) (pulled 6) (filtered 37)
@dadgar
Copy link
Contributor

dadgar commented Jul 18, 2017

@rlenk I think they might be intimately related. We are pretty sure we have a fix for it so we will see if this comes back during rc2.

@rlenk
Copy link
Author

rlenk commented Jul 19, 2017

@dadgar Thanks for the update. I've now seen "pending with no allocation" happen a few times where there wasn't an issue with the Docker engine. I believe we've solved the issue with the Docker timeouts, so we'll see if this continues and wait for RC2.

@schmichael
Copy link
Member

Should be fixed by #2852 which will be released in 0.6rc2 soon. Thanks for the report!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants