Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nil pointer dereference in deployment monitor #15235

Closed
siennathesane opened this issue Nov 14, 2022 · 3 comments · Fixed by #16011
Closed

nil pointer dereference in deployment monitor #15235

siennathesane opened this issue Nov 14, 2022 · 3 comments · Fixed by #16011
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug

Comments

@siennathesane
Copy link

Nomad version

Nomad v1.4.2

Operating system and Environment details

image

Issue

There is a nil pointer dereference when running nomad job run.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=0x1042cf598]

goroutine 1 [running]:
github.com/hashicorp/nomad/command.(*DeploymentStatusCommand).ttyMonitor(0x14000a9f928, 0x1400083ec40, {0x140009fdad0, 0x24}, 0x0, 0x0)
	github.com/hashicorp/nomad/command/deployment_status.go:329 +0x1e48
github.com/hashicorp/nomad/command.(*DeploymentStatusCommand).monitor(0x1400083ee00?, 0x14000c26040?, {0x140009fdad0, 0x24}, 0x2?, 0x0?)
	github.com/hashicorp/nomad/command/deployment_status.go:185 +0x84
github.com/hashicorp/nomad/command.(*monitor).monitor(0x14000bc2690, {0x140009fda10, 0x24})
	github.com/hashicorp/nomad/command/monitor.go:302 +0xbc8
github.com/hashicorp/nomad/command.(*JobRunCommand).Run(0x1400052aa00, {0x1400004e070, 0x1, 0x1})
	github.com/hashicorp/nomad/command/job_run.go:376 +0x1074
github.com/mitchellh/cli.(*CLI).Run(0x140000e3540)
	github.com/mitchellh/[email protected]/cli.go:262 +0x4a8
main.RunCustom({0x1400004e050?, 0x3, 0x3})
	github.com/hashicorp/nomad/main.go:117 +0x350
main.Run(...)
	github.com/hashicorp/nomad/main.go:87
main.main()
	github.com/hashicorp/nomad/main.go:83 +0x50

Reproduction steps

Here is my waypoint.nomad job, it's using a GCP CSI volume on the backend.

job "waypoint" {
  region      = "americas"
  datacenters = ["us-central1"]
  type        = "service"

  group "server" {
    count = 1

    update {
      max_parallel     = 1
      canary           = 1
      min_healthy_time = "10s"
      healthy_deadline = "3m"
      auto_revert      = true
      auto_promote     = true
    }

    network {
      mode = "bridge"
      port "http" {}
      port "grpc" {}
    }

    service {
      name     = "waypoint"
      provider = "consul"
      tags = [
        "traefik.http.services.waypoint.loadBalancer.server.port=${NOMAD_PORT_http}",
        "traefik.http.routers.waypoint.rule=Host(\"waypoint.domain.io\")",
        "traefik.http.routers.waypoint.entrypoints=websecure",
        "traefik.http.routers.waypoint.service=waypoint@consulcatalog",
        "traefik.http.routers.waypoint.tls=true"
      ]
    }

    service {
      name     = "waypoint-api"
      provider = "consul"

      tags = [
        "traefik.http.services.waypoint-api.loadBalancer.server.port=${NOMAD_PORT_grpc}",
        "traefik.http.routers.waypoint-api.rule=Host(\"waypoint.domain.io\")",
        "traefik.http.routers.waypoint-api.entrypoints=websecure",
        "traefik.http.routers.waypoint-api.service=waypoint-api@consulcatalog",
        "traefik.http.routers.waypoint-api.tls=true"
      ]
    }

    volume "waypoint" {
      type            = "csi"
      source          = "waypoint"
      access_mode     = "single-node-writer"
      attachment_mode = "file-system"
    }

    task "disk-check" {
      driver = "docker"
      config {
        image   = "busybox:latest"
        command = "sh"
        args = [
          "-c",
          "chown -R 100:1000 /data/"
        ]
      }
      resources {
        cpu        = 100
        memory     = 100
        memory_max = 150
      }
      restart {
        attempts = 2
        interval = "3s"
        mode     = "fail"
      }
      lifecycle {
        hook    = "prestart"
        sidecar = false
      }
      volume_mount {
        volume      = "waypoint"
        destination = "/data"
      }
    }

    task "server" {
      driver = "docker"

      config {
        image = "hashicorp/waypoint:0.10.3"

        args = [
          "server",
          "run",
          "-accept-tos",
          "-db=/data/data.db",
          "-tls-cert-file=/home/waypoint/tls.crt",
          "-tls-key-file=/home/waypoint/tls.key",
          "-advertise-addr=https://waypoint.domain.io",
          "-listen-grpc=0.0.0.0:${NOMAD_PORT_grpc}",
          "-listen-http-insecure=0.0.0.0:${NOMAD_PORT_http}"
        ]

        ports = [
          "http",
          "grpc"
        ]

        volumes = [
          "local/tls.crt:/home/waypoint/tls.crt",
          "local/tls.key:/home/waypoint/tls.key"
        ]
      }

      template {
        data        = <<EOF
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
EOF
        destination = "local/tls.crt"
      }

      template {
        data        = <<EOF
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
EOF
        destination = "local/tls.key"
      }

      resources {
        cpu        = 100
        memory_max = 512
      }

      restart {
        attempts = 2
        mode     = "fail"
      }

      volume_mount {
        volume      = "waypoint"
        destination = "/data"
      }
    }
  }
}

And the waypoint.volume.nomad definition:

# volume registration
type      = "csi"
id        = "waypoint"
name      = "waypoint"
plugin_id = "gcepd"

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

topology_request {
  required {
    topology {
      segments { "topology.gke.io/zone" = "us-central1-a" }
    }
  }
}

Expected Result

The CLI wouldn't panic.

Actual Result

Nomad is throwing the same errors #13450, specifically volume max claim reached.

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@siennathesane
Copy link
Author

Waypoint was failing to deploy due to a CSI issue, but the Nomad CLI was throwing this error.

@jrasell
Copy link
Member

jrasell commented Nov 14, 2022

Hi @mxplusb and thanks for raising this issue. We will take a look into reproducing this and raising a fix.

Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug
Projects
Development

Successfully merging a pull request may close this issue.

2 participants