Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.6.0dev c96995b] allocation does not exist, stuck in pending #2747

Closed
DapperTayra opened this issue Jun 28, 2017 · 9 comments
Closed

[0.6.0dev c96995b] allocation does not exist, stuck in pending #2747

DapperTayra opened this issue Jun 28, 2017 · 9 comments

Comments

@DapperTayra
Copy link

DapperTayra commented Jun 28, 2017

Nomad version

Nomad v0.6.0-dev (c96995b+CHANGES)
it was the latest stable build on the CI-server for linux, at the time (https://travis-ci.org/hashicorp/nomad/builds/245519967)

Operating system and Environment details

Ubuntu 16.04.2 LTS

Issue

Allocation folder is not created.
After stopping and starting correctly for a few hours, tasks stay in "pending" and allocation does not exist. After that it is every time reproducibly.

Reproduction steps

Stop and start a system-type job (zookeeper) with raw-exec, continuously, for a few hours.

Starting the job

==> Monitoring evaluation "c25a1f6f"
Evaluation triggered by job "zookeeper"
Allocation "85cd2cdb" created: node "7dcfd48d", group "zk"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "c25a1f6f" finished with status "complete"

checking the job

Allocations
ID Node ID Task Group Desired Status Created At
85cd2cdb-e46d-63d4-28c7-75c32ebb019b 7dcfd48d-b306-1771-2b91-b334f650e669 zk run pending 06/28/17 14:28:48 CEST

ID = 85cd2cdb-e46d-63d4-28c7-75c32ebb019b
Eval ID = c25a1f6f-c0a7-3d41-2806-1ffc610060ef
Name = zookeeper.zk[0]
Node ID = 7dcfd48d-b306-1771-2b91-b334f650e669
Job ID = zookeeper
Client Status = pending
Client Description =
Desired Status = run
Desired Description =
Created At = 06/28/17 14:28:48 CEST
Evaluated Nodes = 1
Filtered Nodes = 0
Exhausted Nodes = 0
Allocation Time = 32.143µs
Failures = 0

Couldn't retrieve stats (HINT: ensure Client.Advertise.HTTP is set): Unexpected response code: 500 (unknown allocation ID "85cd2cdb-e46d-63d4-28c7-75c32ebb019b")

Nomad Server logs (if appropriate)

2017/06/28 14:28:48.191792 [DEBUG] http: Request /v1/node/7dcfd48d-b306-1771-2b91-b334f650e669/allocations (5.285766ms)
2017/06/28 14:28:48.218902 [DEBUG] http: Request /v1/jobs (184.575µs)
2017/06/28 14:28:48.269848 [DEBUG] http: Request /v1/evaluation/63dd241d-2292-0d91-7bc0-8c144da0b2d3 (97.452µs)
2017/06/28 14:28:48.270547 [DEBUG] http: Request /v1/evaluation/63dd241d-2292-0d91-7bc0-8c144da0b2d3/allocations (74.02µs)
2017/06/28 14:28:48.289463 [DEBUG] http: Request /v1/jobs (4.823423ms)
2017/06/28 14:28:48.289480 [DEBUG] worker: dequeued evaluation c25a1f6f-c0a7-3d41-2806-1ffc610060ef
2017/06/28 14:28:48.289558 [DEBUG] sched: <Eval 'c25a1f6f-c0a7-3d41-2806-1ffc610060ef' JobID: 'zookeeper'>: allocs: (place 1) (update 0) (migrate 0) (stop 0) (ignore 0) (lost 0)
2017/06/28 14:28:48.291087 [DEBUG] worker: submitted plan for evaluation c25a1f6f-c0a7-3d41-2806-1ffc610060ef
2017/06/28 14:28:48.291106 [DEBUG] sched: <Eval 'c25a1f6f-c0a7-3d41-2806-1ffc610060ef' JobID: 'zookeeper'>: setting status to complete
2017/06/28 14:28:48.292026 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef (147.784µs)
2017/06/28 14:28:48.292967 [DEBUG] worker: updated evaluation <Eval 'c25a1f6f-c0a7-3d41-2806-1ffc610060ef' JobID: 'zookeeper'>
2017/06/28 14:28:48.292994 [DEBUG] worker: ack for evaluation c25a1f6f-c0a7-3d41-2806-1ffc610060ef
2017/06/28 14:28:48.293086 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef/allocations (59.616µs)
2017/06/28 14:28:49.295300 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef (93.644µs)
2017/06/28 14:28:49.296089 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef/allocations (76.657µs)
2017/06/28 14:28:51.120297 [DEBUG] http: Request /v1/agent/members (161.74µs)
2017/06/28 14:28:51.120904 [DEBUG] http: Request /v1/status/leader?region=global (26.519µs)
2017/06/28 14:28:51.134218 [DEBUG] http: Request /v1/nodes (88.538µs)
2017/06/28 14:28:51.140131 [DEBUG] http: Request /v1/node/7dcfd48d-b306-1771-2b91-b334f650e669/allocations (5.196479ms)
2017/06/28 14:28:51.167762 [DEBUG] http: Request /v1/jobs (288.523µs)

Nomad Client logs (if appropriate)

Job file (if appropriate)

@DapperTayra
Copy link
Author

We've tested with build 5811, but in the pull requests, there's this merge about persistent allocations and new evaluations, which at least to us sounds related: 5d9ca1b

Currently re-running tests with build 5839 at revision e9a55d9 to see.

@DapperTayra
Copy link
Author

nope, problem persists with
e9a55d9

@DapperTayra
Copy link
Author

problem is not reproducible by stopping and starting jobs with version 0.5.6, at least so far.

@dadgar
Copy link
Contributor

dadgar commented Jun 28, 2017

@DapperTayra Can you share the client logs

@dadgar
Copy link
Contributor

dadgar commented Jun 28, 2017

Can you also share nomad alloc-status -json <id> of one of the pending allocs

@DapperTayra
Copy link
Author

Here is the json data.
json.txt

We have server and client running together.

We stopped the previous task. Its allocation is still available: Its allocation's executor's log is this:
2017/06/29 17:40:14.847440 [DEBUG] executor: launching command /opt/jacktime/scripts/start_zk.sh

The newly started task doesn't have an allocation folder though.

@jshuping
Copy link

We are seeing the same issue.

  • using 0.6.0-rc1 on Ubu 16.04
  • with a service-type job using raw_exec, has 20 task groups of 1 task each
  • hourly restarts/stops/starts work fine 0.5.6
  • getting about 4 out of 20 task groups with this half-pending-no-alloc thing every time

@schmichael
Copy link
Member

Should be fixed by #2852 which will be released in 0.6rc2 soon.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants