Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad server panic: runtime error: invalid memory address or nil pointer dereference #4463

Closed
dcparker88 opened this issue Jul 2, 2018 · 14 comments · Fixed by #4474
Closed

Comments

@dcparker88
Copy link

dcparker88 commented Jul 2, 2018

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Nomad v0.8.3 (c85483d)

Operating system and Environment details

Linux nomad-97d52edaa6767264 2.6.32-696.30.1.el6.centos.plus.x86_64 #1 SMP Wed May 23 20:32:06 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/redhat-release
CentOS release 6.9 (Final)

Issue

Our Nomad cluster went it to a weird state over the weekend, all 3 servers started crashing on startup with the following:

Desired Changes for "curator": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)
    2018/07/02 15:14:38 [DEBUG] sched: <Eval "d875e98a-8db0-64f2-5dc9-c12157823669" JobID: "curator" Namespace: "default">: setting status to complete
    2018/07/02 15:14:38 [DEBUG] sched: <Eval "2cc7039e-7f45-1d56-ca4e-23bc7f4a9045" JobID: "curator/periodic-1530334800" Namespace: "default">: Total changes: (place 0) (destructive 0) (inplace 0) (stop 0)
Desired Changes for "curator": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 1) (canary 0)
    2018/07/02 15:14:38 [DEBUG] sched: <Eval "2cc7039e-7f45-1d56-ca4e-23bc7f4a9045" JobID: "curator/periodic-1530334800" Namespace: "default">: setting status to complete
    2018/07/02 15:14:38 [DEBUG] sched: <Eval "a90bfea6-9e6d-6714-6f99-4a249e32e00a" JobID: "elk" Namespace: "default">: Total changes: (place 1) (destructive 0) (inplace 0) (stop 0)
Desired Changes for "es-cluster-master": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 13) (canary 0)
Desired Changes for "logstash": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 4) (canary 0)
Desired Changes for "kibana": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 2) (canary 0)

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xc8 pc=0xf8d94a]

goroutine 32 [running]:
github.com/hashicorp/nomad/nomad/structs.(*Node).Ready(...)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/structs/structs.go:1431
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).findPreferredNode(0xc42034de00, 0x2016ee0, 0xc4207d2450, 0x11, 0x20d3d40, 0xc420585f00)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:596 +0xfa
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computePlacements(0xc42034de00, 0x20d2740, 0x0, 0x0, 0xc42003f230, 0x1, 0x1, 0xc4201bc200, 0x14)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:448 +0x2eb
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computeJobAllocs(0xc42034de00, 0xc420614680, 0xc420910080)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:410 +0x178d
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).process(0xc42034de00, 0xc4207b3980, 0xc4206bd710, 0xc420098c60)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:245 +0x535
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).(github.com/hashicorp/nomad/scheduler.process)-fm(0x7f07d45b4d90, 0xc420715220, 0x3)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:144 +0x2a
github.com/hashicorp/nomad/scheduler.retryMax(0x5, 0xc4206bd8a0, 0xc4206bd8b0, 0xc, 0xffffffffffffffff)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/util.go:271 +0x46
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process(0xc42034de00, 0xc420098c60, 0xc4201d24b0, 0x2017d20)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:144 +0x123
github.com/hashicorp/nomad/nomad.(*nomadFSM).reconcileQueuedAllocations(0xc4201e3560, 0x7d70, 0x0, 0x0)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:1321 +0x947
github.com/hashicorp/nomad/nomad.(*nomadFSM).applyReconcileSummaries(0xc4201e3560, 0xc420257c51, 0x8, 0x8, 0x7d70, 0x4abcc748, 0xc4206bdcd8)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:746 +0x7e
github.com/hashicorp/nomad/nomad.(*nomadFSM).Apply(0xc4201e3560, 0xc4202f6030, 0x20af600, 0xbec6bc4793cdeffb)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:210 +0x6f1
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc4207893a0)
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/fsm.go:57 +0x17b
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*Raft).runFSM(0xc42025c000)
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/fsm.go:120 +0x31e
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.runFSM)-fm()
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/api.go:506 +0x2a
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc42025c000, 0xc420337c20)
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*raftState).goFunc
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/state.go:144 +0x66

The servers join together in a cluster, and a leader is elected, but the Nomad boxes crash instantly afterward.

peers.json recovery doesn't seem to work either, it crashes with the same error.

I am assuming I can fix this by fully cleaning my data-dir and restarting, but ideally we wouldn't need to do that.

Reproduction steps

This is the only time this has happened to us - so I'm not sure what the reproduction would be.

Nomad Server logs (if appropriate)

posted above - can post more if needed.

Nomad Client logs (if appropriate)

Job file (if appropriate)

@dcparker88
Copy link
Author

I think I might have it narrowed down to a job in our cluster causing it - but I'm not sure how to delete/kill this job since the servers aren't up long enough for me to stop it.

@nickethier
Copy link
Member

@dcparker88 I'm looking into this now, what about the job makes you think its causing it?

@dcparker88
Copy link
Author

I might be way off - but I turned on Debug logs, and it lists out jobs, but never lists out a batch job that we have. The batch job also seems to be "flapping" - appearing and disappearing in the job status list, etc.

This could just be a symptom of the nomad servers constantly restarting, however.

@dcparker88
Copy link
Author

here is a full log coming from a clean start (deleted everything in nomad data dir and started it fresh)


                Client: false
             Log Level: DEBUG
                Region: global (DC: datacenter)
                Server: true
               Version: 0.8.3

==> Nomad agent started! Log data will stream in below:

    2018/07/02 15:55:56.503321 [WARN] consul.sync: Consul does NOT support TLSSkipVerify; please upgrade to Consul 0.7.2 or newer
    2018/07/02 15:55:56.505131 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks
    2018/07/02 15:55:56 [INFO] raft: Initial configuration (index=0): []
    2018/07/02 15:55:56 [INFO] raft: Node at 10.60.151.181:4647 [Follower] entering Follower state (Leader: "")
    2018/07/02 15:55:56 [INFO] serf: EventMemberJoin: nomad-d2f68611ddf4d4de 10.60.151.181
    2018/07/02 15:55:56.513826 [INFO] nomad: starting 4 scheduling worker(s) for [service batch system _core]
    2018/07/02 15:55:56.514156 [INFO] nomad: adding server nomad-d2f68611ddf4d4de (Addr: 10.60.151.181:4647) (DC: datacenter)
    2018/07/02 15:55:56.514301 [DEBUG] server.nomad: lost contact with Nomad quorum, falling back to Consul for server list
    2018/07/02 15:55:56.514880 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks
    2018/07/02 15:55:56 [DEBUG] memberlist: Initiating push/pull sync with: 10.60.153.102:4648
    2018/07/02 15:55:56 [INFO] serf: EventMemberJoin: nomad-97d52edaa6767264 10.60.152.87
    2018/07/02 15:55:56 [INFO] serf: EventMemberJoin: nomad-0047a04d84848676 10.60.153.102
    2018/07/02 15:55:56.520885 [INFO] nomad: adding server nomad-97d52edaa6767264 (Addr: 10.60.152.87:4647) (DC: datacenter)
    2018/07/02 15:55:56 [DEBUG] memberlist: Initiating push/pull sync with: 10.60.152.87:4648
    2018/07/02 15:55:56.523247 [INFO] server.nomad: successfully contacted 2 Nomad Servers
    2018/07/02 15:55:56.523282 [INFO] nomad: Existing Raft peers reported by nomad-97d52edaa6767264 (10.60.152.87:4647), disabling bootstrap mode
    2018/07/02 15:55:56.523305 [INFO] nomad: adding server nomad-0047a04d84848676 (Addr: 10.60.153.102:4647) (DC: datacenter)
    2018/07/02 15:55:56 [DEBUG] raft-net: 10.60.151.181:4647 accepted connection from: 10.60.153.102:46554
    2018/07/02 15:55:56 [DEBUG] raft-net: 10.60.151.181:4647 accepted connection from: 10.60.153.102:46556
    2018/07/02 15:55:56 [WARN] raft: Failed to get previous log: 34628 log not found (last: 0)
    2018/07/02 15:55:56 [INFO] snapshot: Creating new snapshot at /nomad/server/raft/snapshots/20-24581-1530561356812.tmp
    2018/07/02 15:55:56 [INFO] raft: Copied 335172 bytes to local snapshot
    2018/07/02 15:55:56 [INFO] raft: Installed remote snapshot
    2018/07/02 15:55:57 [ERR] raft-net: Failed to decode incoming command: read tcp 192.168.208.98:4647->10.60.153.102:46554: read: connection reset by peer
    2018/07/02 15:55:57 [DEBUG] memberlist: TCP connection from=10.60.153.102:50526
    2018/07/02 15:55:57 [INFO] serf: EventMemberUpdate: nomad-0047a04d84848676
    2018/07/02 15:55:57 [INFO] serf: EventMemberUpdate: nomad-97d52edaa6767264
    2018/07/02 15:55:58 [WARN] raft: Heartbeat timeout from "10.60.153.102:4647" reached, starting election
    2018/07/02 15:55:58 [INFO] raft: Node at 10.60.151.181:4647 [Candidate] entering Candidate state in term 75876
    2018/07/02 15:55:58 [DEBUG] raft: Votes needed: 2
    2018/07/02 15:55:58 [DEBUG] raft: Vote granted from 10.60.151.181:4647 in term 75876. Tally: 1
    2018/07/02 15:55:58 [DEBUG] raft-net: 10.60.151.181:4647 accepted connection from: 10.60.152.87:59094
    2018/07/02 15:55:58 [INFO] raft: Node at 10.60.151.181:4647 [Follower] entering Follower state (Leader: "")
    2018/07/02 15:55:58 [WARN] raft: Failed to get previous log: 34630 log not found (last: 27525)
    2018/07/02 15:55:59 [ERR] raft-net: Failed to decode incoming command: read tcp 192.168.208.98:4647->10.60.152.87:59094: read: connection reset by peer
    2018/07/02 15:55:59 [DEBUG] memberlist: TCP connection from=10.60.152.87:52420
    2018/07/02 15:55:59 [INFO] serf: EventMemberUpdate: nomad-97d52edaa6767264
    2018/07/02 15:55:59 [INFO] serf: EventMemberUpdate: nomad-0047a04d84848676
    2018/07/02 15:56:00 [WARN] raft: Heartbeat timeout from "10.60.152.87:4647" reached, starting election
    2018/07/02 15:56:00 [INFO] raft: Node at 10.60.151.181:4647 [Candidate] entering Candidate state in term 75878
    2018/07/02 15:56:00 [ERR] raft: Failed to make RequestVote RPC to {Voter 10.60.152.87:4647 10.60.152.87:4647}: EOF
    2018/07/02 15:56:00 [ERR] raft: Failed to make RequestVote RPC to {Voter 10.60.153.102:4647 10.60.153.102:4647}: EOF
    2018/07/02 15:56:00 [DEBUG] raft: Votes needed: 2
    2018/07/02 15:56:00 [DEBUG] raft: Vote granted from 10.60.151.181:4647 in term 75878. Tally: 1
    2018/07/02 15:56:00 [DEBUG] raft-net: 10.60.151.181:4647 accepted connection from: 10.60.153.102:46588
    2018/07/02 15:56:00 [INFO] raft: Duplicate RequestVote for same term: 75878
    2018/07/02 15:56:00 [WARN] raft: Failed to get previous log: 34632 log not found (last: 30149)
    2018/07/02 15:56:00 [INFO] raft: Node at 10.60.151.181:4647 [Follower] entering Follower state (Leader: "10.60.153.102:4647")
    2018/07/02 15:56:00.464539 [DEBUG] http: Request GET /v1/agent/health?type=server (3.568436ms)
    2018/07/02 15:56:00 [DEBUG] raft-net: 10.60.151.181:4647 accepted connection from: 10.60.153.102:46590
    2018/07/02 15:56:00 [DEBUG] sched: <Eval "f819b920-f9c3-0643-4949-ca25eb373926" JobID: "curator" Namespace: "default">: Total changes: (place 1) (destructive 0) (inplace 0) (stop 0)
Desired Changes for "curator": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)
    2018/07/02 15:56:00 [DEBUG] sched: <Eval "f819b920-f9c3-0643-4949-ca25eb373926" JobID: "curator" Namespace: "default">: setting status to complete
    2018/07/02 15:56:00 [DEBUG] sched: <Eval "2b52eb38-341b-f00c-a3f0-ae2c6138a058" JobID: "curator/periodic-1530334800" Namespace: "default">: Total changes: (place 0) (destructive 0) (inplace 0) (stop 0)
Desired Changes for "curator": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 1) (canary 0)
    2018/07/02 15:56:00 [DEBUG] sched: <Eval "2b52eb38-341b-f00c-a3f0-ae2c6138a058" JobID: "curator/periodic-1530334800" Namespace: "default">: setting status to complete
    2018/07/02 15:56:00 [DEBUG] sched: <Eval "4f0962a1-a590-05f8-20c3-1cf406793768" JobID: "elk" Namespace: "default">: Total changes: (place 1) (destructive 0) (inplace 0) (stop 0)
Desired Changes for "es-cluster-master": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 13) (canary 0)
Desired Changes for "logstash": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 4) (canary 0)
Desired Changes for "kibana": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 2) (canary 0)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xc8 pc=0xf8d94a]

goroutine 22 [running]:
github.com/hashicorp/nomad/nomad/structs.(*Node).Ready(...)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/structs/structs.go:1431
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).findPreferredNode(0xc420be7cc0, 0x2016ee0, 0xc420c19c50, 0x11, 0x20d3d40, 0xc42055ef00)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:596 +0xfa
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computePlacements(0xc420be7cc0, 0x20d2740, 0x0, 0x0, 0xc420451bb0, 0x1, 0x1, 0xc4201aa100, 0x14)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:448 +0x2eb
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computeJobAllocs(0xc420be7cc0, 0xc420bd24e0, 0xc4205c8e40)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:410 +0x178d
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).process(0xc420be7cc0, 0xc4205c85c0, 0xc420a85710, 0xc4205f09a0)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:245 +0x535
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).(github.com/hashicorp/nomad/scheduler.process)-fm(0x7f5db91fb000, 0xc4206e95d0, 0x3)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:144 +0x2a
github.com/hashicorp/nomad/scheduler.retryMax(0x5, 0xc420a858a0, 0xc420a858b0, 0xc, 0xffffffffffffffff)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/util.go:271 +0x46
github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process(0xc420be7cc0, 0xc4205f09a0, 0xc420082eb0, 0x2017d20)
	/opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:144 +0x123
github.com/hashicorp/nomad/nomad.(*nomadFSM).reconcileQueuedAllocations(0xc42034faa0, 0x7d70, 0x0, 0x0)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:1321 +0x947
github.com/hashicorp/nomad/nomad.(*nomadFSM).applyReconcileSummaries(0xc42034faa0, 0xc420304461, 0x8, 0x8, 0x7d70, 0xf02c2a8b, 0xc420239cd8)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:746 +0x7e
github.com/hashicorp/nomad/nomad.(*nomadFSM).Apply(0xc42034faa0, 0xc420a8a8a0, 0x20af600, 0xbec6beb41eb197ce)
	/opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:210 +0x6f1
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc420508c70)
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/fsm.go:57 +0x17b
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*Raft).runFSM(0xc420246000)
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/fsm.go:120 +0x31e
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.runFSM)-fm()
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/api.go:506 +0x2a
github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420246000, 0xc4201bc340)
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.(*raftState).goFunc
	/opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/state.go:144 +0x66

@42wim
Copy link
Contributor

42wim commented Jul 2, 2018

Looking at the code it seems to lookup a node based on id of the allocation (in findPreferredNode) but is getting a empty node (nil) back.

Some inconsistency somewhere, to get your servers (probably) running again you could add a if n == nil { return false } before

return n.Status == NodeStatusReady && !n.Drain && n.SchedulingEligibility == NodeSchedulingEligible

But maybe hashicorp wants you to try other stuff first. :)

@nickethier
Copy link
Member

@dcparker88 I think I found the bug, but I don't have a work around for fixing your state yet (and I'm not sure if I will (but I'm trying!)).

I think a job (maybe the problematic one you mentioned) trying to get scheduled to a node that for some reason doesn't exist in the state store. You could ultimately fix this by stopping all nomad server nodes, wiping the datadir and starting them backup.

I'll let you know as I get more info.

@nickethier
Copy link
Member

@42wim correct, the fix proper is here: 1acbf1d

@dcparker88
Copy link
Author

thanks - resetting the data dir on all my servers did work. I lost all my jobs - but that's ok for now since we can recreate them quickly in terraform.

@nickethier
Copy link
Member

@dcparker88 glad you’re working again and it wasn’t too much of an impact, never a route you should have to take though.

The current hypothesis is that it’s related to sticky volumes. Did the job you mentioned have a sticky enabled volume by chance?

@dcparker88
Copy link
Author

one of our jobs does, yes. the one I thought was the impact did not, but again I might be wrong about what actual job it was. the sticky job also has a distinct_host constraint turned on.

@chelseakomlo
Copy link
Contributor

chelseakomlo commented Jul 3, 2018

@dcparker88 Can you please include the job files for the one which requires sticky volumes, and the other job that you thought was suspect mentioned above?

@dcparker88
Copy link
Author

dcparker88 commented Jul 5, 2018

yeah - here is the relevant group (with the sticky volumes): https://gist.github.com/dcparker88/2f450f8976a43490db0654e738b4e5ba

@dcparker88
Copy link
Author

the one I thought was potentially causing it is here: https://gist.github.com/dcparker88/705effd1b374bfc51399e3c54f25e571

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants