Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad panics in the background when a job with invalid volume_mount block is registered #10834

Closed
lgfa29 opened this issue Jun 30, 2021 · 4 comments · Fixed by #10855
Closed
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/api HTTP API and SDK issues type/bug
Milestone

Comments

@lgfa29
Copy link
Contributor

lgfa29 commented Jun 30, 2021

Nomad version

Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244)

Operating system and Environment details

MacOS

Issue

An invalid volume_mount block in a job file causes a panic in Nomad. The panic doesn't cause the entire agent to crash, just the HTTP server.

Reproduction steps

  1. Start nomad agent (-dev is fine)
  2. Run invalid job.

Expected Result

Nomad returns an validation error message.

Actual Result

Nomad panic and a generic error message from the CLI

$ nomad run example.nomad
Error submitting job: Put "http://127.0.0.1:4646/v1/jobs": EOF

Job file (if appropriate)

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    network {
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      # Invalid block
      volume_mount {}

      config {
        image = "redis:3.2"

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Nomad Server logs (if appropriate)

    2021-06-30T15:15:46.565-0400 [DEBUG] http: request complete: method=PUT path=/v1/jobs duration=133.211µs
    2021-06-30T15:15:46.565-0400 [ERROR] http: http: panic serving 127.0.0.1:55410: runtime error: invalid memory address or nil pointer dereference
goroutine 8990 [running]:
net/http.(*conn).serve.func1(0xc0008aa1e0)
        net/http/server.go:1824 +0x153
panic(0x607fe40, 0x7b0de10)
        runtime/panic.go:971 +0x499
github.com/hashicorp/nomad/command/agent.ApiTaskToStructsTask(0xc0006a4240, 0xc001152900, 0xc000494b00, 0xc000494c60)
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:1032 +0x10fc
github.com/hashicorp/nomad/command/agent.ApiTgToStructsTG(0xc0006a4240, 0xc001152800, 0xc001152900)
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:987 +0x4d6
github.com/hashicorp/nomad/command/agent.ApiJobToStructJob(0xc001fddb00, 0x0)
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:862 +0x4dd
github.com/hashicorp/nomad/command/agent.(*HTTPServer).apiJobAndRequestToStructs(0xc0001d42d0, 0xc001fddb00, 0xc001152700, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x405979e, ...)
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:717 +0x1e5
github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobUpdate(0xc0001d42d0, 0x6a1d6f0, 0xc000a405a0, 0xc001152700, 0x0, 0x0, 0x2d5a89cc, 0x75f72d5a89cc, 0x100000001, 0xc001f088b0)
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:393 +0x231
github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobsRequest(0xc0001d42d0, 0x6a1d6f0, 0xc000a405a0, 0xc001152700, 0x40df206, 0x60dcc2e2, 0x21b326c8, 0x75f72d5a89cc)
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:22 +0x85
github.com/hashicorp/nomad/command/agent.(*HTTPServer).wrap.func1(0x6a1d6f0, 0xc000a405a0, 0xc001152700)
        github.com/hashicorp/nomad/command/agent/http.go:461 +0x178
net/http.HandlerFunc.ServeHTTP(0xc000692840, 0x6a1d6f0, 0xc000a405a0, 0xc001152700)
        net/http/server.go:2069 +0x44
net/http.(*ServeMux).ServeHTTP(0xc000030480, 0x6a1d6f0, 0xc000a405a0, 0xc001152700)
        net/http/server.go:2448 +0x1ad
github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0x6a239c0, 0xc000aca000, 0xc001152700)
        github.com/NYTimes/[email protected]/gzip.go:277 +0x1e6
net/http.HandlerFunc.ServeHTTP(0xc000bbd020, 0x6a239c0, 0xc000aca000, 0xc001152700)
        net/http/server.go:2069 +0x44
net/http.serverHandler.ServeHTTP(0xc0005761c0, 0x6a239c0, 0xc000aca000, 0xc001152700)
        net/http/server.go:2887 +0xa3
net/http.(*conn).serve(0xc0008aa1e0, 0x6a322e0, 0xc0014b9180)
        net/http/server.go:1952 +0x8cd
created by net/http.(*Server).Serve
        net/http/server.go:3013 +0x39b
@tgross
Copy link
Member

tgross commented Jun 30, 2021

Looks like this is the cursed api-to-structs bit of code in the HTTP server. In job_endpoint.go#L1028-L1038 we incorrectly assume if there are any volume_mount blocks that they aren't empty. So we need a nil check there.

@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/api HTTP API and SDK issues labels Jul 1, 2021
@danishprakash
Copy link
Contributor

@tgross went through the snippet you attached in your comment, looks like a simple fix. I'm curious though and I'm probably missing something here but can we not propagate en empty block in the first place?

@tgross
Copy link
Member

tgross commented Jul 6, 2021

Ideally yes. Because we do all our RPC handling, auth, and validation on the leader, the Nomad HTTP requests get converted to the RPC request before validation. But the API inputs may be controlled by a developer end user (ex. someone working on the Nomad Terraform provider, on Levant, or a custom piece of control software), in which case we want to make sure that we're handling them safely anyways.

Should be fixed in #10855

@tgross tgross self-assigned this Jul 6, 2021
@tgross tgross added this to the 1.1.3 milestone Jul 6, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/api HTTP API and SDK issues type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants