Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null Pointer Exception on startup #1334

Closed
jshaw86 opened this issue Jun 21, 2016 · 12 comments
Closed

Null Pointer Exception on startup #1334

jshaw86 opened this issue Jun 21, 2016 · 12 comments

Comments

@jshaw86
Copy link

jshaw86 commented Jun 21, 2016

Nomad version

Nomad v0.3.2

Operating system and Environment details

Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-87-generic x86_64)

Issue

Null pointer exception when starting nomad, and upstart/init.d can't get agent back up

Reproduction steps

start nomad client: sudo nomad agent -client -servers=nomad-sched1-priv.bronze.aws-pdx-3.ps.pn -config=/etc/nomad.d

Nomad Client logs (if appropriate)

 Loaded configuration from /etc/nomad.d/nomad_agent.json
==> Starting Nomad agent...
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x78 pc=0x5643f8]

goroutine 1 [running]:
panic(0x1124d20, 0xc8200140e0)
        /opt/go/src/runtime/panic.go:464 +0x3e6
github.com/hashicorp/nomad/client/driver.GetTaskEnv(0xc8203437d0, 0xc82014a900, 0x0, 0xc8203521e0, 0x0, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/client/driver/driver.go:145 +0xf8
github.com/hashicorp/nomad/client.(*TaskRunner).setTaskEnv(0xc820301300, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/client/task_runner.go:203 +0x5a
github.com/hashicorp/nomad/client.(*TaskRunner).RestoreState(0xc820301300, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/client/task_runner.go:140 +0x122
github.com/hashicorp/nomad/client.(*AllocRunner).RestoreState(0xc8203505b0, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/client/alloc_runner.go:129 +0x593
github.com/hashicorp/nomad/client.(*Client).restoreState(0xc820215200, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/client/client.go:429 +0x53b
github.com/hashicorp/nomad/client.NewClient(0xc82022c000, 0xc82022c000, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/client/client.go:171 +0x8b9
github.com/hashicorp/nomad/command/agent.(*Agent).setupClient(0xc82000b860, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/command/agent/agent.go:290 +0x173
github.com/hashicorp/nomad/command/agent.NewAgent(0xc820086b40, 0x7f8a6fb90da8, 0xc820216c60, 0xc82000b7c0, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/command/agent/agent.go:55 +0x247
github.com/hashicorp/nomad/command/agent.(*Command).setupAgent(0xc820084fa0, 0xc820086b40, 0x7f8a6fb90da8, 0xc820216c60, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/command/agent/command.go:286 +0xc3
github.com/hashicorp/nomad/command/agent.(*Command).Run(0xc820084fa0, 0xc82000a0c0, 0x3, 0x3, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/command/agent/command.go:389 +0x597
github.com/hashicorp/nomad/vendor/github.com/mitchellh/cli.(*CLI).Run(0xc8200c20c0, 0xc8200c20c0, 0x0, 0x0)
        /opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/mitchellh/cli/cli.go:153 +0x5ee
main.RunCustom(0xc82000a0b0, 0x4, 0x4, 0xc8201e8210, 0xc82005a058)
        /opt/gopath/src/github.com/hashicorp/nomad/main.go:49 +0x4d5
main.Run(0xc82000a0b0, 0x4, 0x4, 0x7f8a6fb8b028)
        /opt/gopath/src/github.com/hashicorp/nomad/main.go:15 +0x4c
main.main()
        /opt/gopath/src/github.com/hashicorp/nomad/main.go:11 +0x60

nomad.d config

region = "aws-pdx-3"
datacenter = "bronze-aws-pdx-3"
data_dir = "/opt/nomad/data"
log_level = "DEBUG"
enable_syslog = true

bind_addr = "0.0.0.0"

client {
  enabled = true
  alloc_dir = "/opt/nomad/mnt"
  reserved {
    cpu = 500
    memory = 1024
    disk = 1024
  }
  network_speed = 1000
  options {
    consul.address = "172.16.103.60:8500"
    }
}

advertise {
  http = "172.16.103.60:4646"
}

telemetry {
  statsd_address = "0.0.0.0:8125"
}

our setup is single nomad server with 3 nomad clients.

@jshaw86
Copy link
Author

jshaw86 commented Jun 22, 2016

Could be related to #1277 missed it because wasn't searching for European spelling of null.

@dadgar
Copy link
Contributor

dadgar commented Jun 22, 2016

@jshaw86 Was this on first start up or during a restore?

@jshaw86
Copy link
Author

jshaw86 commented Jun 22, 2016

@dadgar we are not sure how it got into the state (syslog doesn't show anything) but the node was listed as down one morning and when we try to bring it back up we get that traceback on startup.

@dadgar
Copy link
Contributor

dadgar commented Jun 22, 2016

Okay so I think it is the same problem as #1277 then. Going to close this. If it occurs on 0.4 please report again.

@dadgar dadgar closed this as completed Jun 22, 2016
@jshaw86
Copy link
Author

jshaw86 commented Jun 22, 2016

@dadgar after building off master and dropping the binary in place I get this behavior

ubuntu@nomad-agent2:~$ nomad version
Nomad v0.4.0-rc2 ('dc4bea26b59f5fbbdcb4a22d5763be2c4043fbb7+CHANGES')
ubuntu@nomad-agent2:~$ sudo nomad agent -client -servers=nomad-sched1-priv.bronze.aws-pdx-3.ps.pn -config=/etc/nomad.d
    Loaded configuration from /etc/nomad.d/nomad_agent.json
==> Starting Nomad agent...
==> Error starting agent: client setup failed: failed to restore state: 1 error(s) occurred:

* 1 error(s) occurred:

* task runner snapshot include nil Task

is the recommended recover to rm -rf /var/nomad per the other ticket? Ideally nomad could recover from this gracefully if there is some other info I can give you to help let me know.

@dadgar
Copy link
Contributor

dadgar commented Jun 22, 2016

If you could actually post the content of <data_dir>/client/alloc/* that would be great. I want to see what it is trying to restore from

@dadgar
Copy link
Contributor

dadgar commented Jun 22, 2016

But yes you can just delete the data_dir folder to continue

@jshaw86
Copy link
Author

jshaw86 commented Jun 22, 2016

@dadgar i've attached the alloc dir
nomad-alloc-state.tar.gz

@jshaw86
Copy link
Author

jshaw86 commented Jun 24, 2016

@dadgar could we reopen this one for nomad to gracefully handle this case rather than early exiting?

@dadgar dadgar reopened this Jun 24, 2016
@dadgar dadgar closed this as completed Jun 24, 2016
@dadgar
Copy link
Contributor

dadgar commented Jun 24, 2016

@jshaw86 Yes, but I need to look into the cause of this a bit more so I can appropriately scope the work/issue. Haven't had time yet to deep dive into the data you have provided

@jshaw86
Copy link
Author

jshaw86 commented Jun 24, 2016

@dadgar ok no problem just wanted to make sure it didn't get lost if it was closed.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants