Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placement failed due to CPU resource exhaustion, 2 servers going unused. #4347

Closed
glimberg opened this issue May 29, 2018 · 10 comments
Closed

Comments

@glimberg
Copy link

glimberg commented May 29, 2018

Nomad version

Nomad v0.8.3 (c85483d)

Operating system and Environment details

CentOS 7

Issue

I updated a job this morning to add 2 more groups into the job specification for 2 new instances of the service I'm managing via Nomad. After updating the job specification I received the following error:

- WARNING: Failed to place all allocations.
  Task Group "e4da7455b2" (failed to place 1 allocation):
    * Resources exhausted on 5 nodes
    * Dimension "cpu" exhausted on 5 nodes

I have 7 servers acting as nomad clients in this cluster. Not 5. 2 of the nomad clients have 0 allocs on them and have plenty of CPU to spare and nomad lists them as 'ready', but it won't alloc any jobs on there.

No errors in the logs and I'm not sure how to reproduce this issue. Job file is attached.

Job file:

job "controller" {
    region = "europe"
    datacenters = ["consul-ams"]
    type = "service"

    update {
        max_parallel = 1
        health_check = "checks"
        min_healthy_time = "1m"
        healthy_deadline = "10m"
        auto_revert = true
        stagger = "30s"
    }

    group "12ac4a1e71" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/12ac4a1e71"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "17d709436c" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/17d709436c"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "565799d8f6" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/565799d8f6"
                ]
            }
        }
        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "8056c2e21c" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/8056c2e21c"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "8bd5124fd6" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/8bd5124fd6"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "93afae5963" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/93afae5963"
                ]
            }
        }
        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "a09acf0233" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/a09acf0233"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "e5cd7a9e1c" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/e5cd7a9e1c"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "ea9349aa9c" {
        # Private controller for Ixia
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 800

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/ea9349aa9c"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "6ab565387a" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/6ab565387a"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "a84ac5c10a" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/a84ac5c10a"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "c7c8172af1" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/c7c8172af1"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "9f77fc393e" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/9f77fc393e"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "8850338390" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/8850338390"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }

    group "e4da7455b2" {
        count = 1

        restart {
            attempts = 10
            delay = "10s"
            interval = "5m"
        }

        task "server" {
            driver = "raw_exec"

            artifact {
                source = "https://zt-controller.s3.amazonaws.com/1.2.5-15/zerotier-one"
                destination = "/controller/bin"
                options {
                    cheksum = "sha256:44d8a42e77ca32086408924f42b25c86fdfa0b5e9b0f128d88bb4d527e979c9f"
                }
            }

            resources {
                cpu = 6000

                network {
                    mbits = 1
                    port "ztc" {}
                }
            }

            env {
                GLIBCXX_FORCE_NEW = 1
                GLIBCPP_FORCE_NEW = 1
                LD_PRELOAD = "/usr/lib64/libjemalloc.so"
            }

            service {
                port = "ztc"
                check {
                    type = "script"
                    command = "/controller/bin/check-controller"
                    args = ["${NOMAD_PORT_ztc}"]
                    interval = "1m"
                    timeout = "59s"
                }
            }

            config {
                command = "/controller/bin/zerotier-one"
                args = [
                    "-p${NOMAD_PORT_ztc}",
                    "/controller/data/e4da7455b2"
                ]
            }
        }

        task "log-collector" {
            driver = "raw_exec"

            template {
                data = <<EOH
filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    fields:
      application_name: controller_{{ env "NOMAD_GROUP_NAME" }}
      gl2_source_collector: 4a7d1f54-acb7-413f-a06d-02748fd7af1f
    ignore_older: 0
    input_type: log
    paths:
    - {{ env "NOMAD_ALLOC_DIR" }}/logs/server.stderr.*
    scan_frequency: 10s
    tail_files: true
output:
  logstash:
    hosts:
    - graylog.int.zerotier.com:5044
path:
  data: /var/cache/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}/filebeat/data
  logs: /var/log/graylog/controller_{{ env "NOMAD_GROUP_NAME" }}
tags:
- chef-client
EOH
                destination = "filebeat/filebeat.yml"
            }

            config {
                command = "/usr/bin/filebeat"
                args = [
                    "-c", "filebeat/filebeat.yml"                        
                ]
            }
        }
    }
}
@glimberg
Copy link
Author

glimberg commented May 29, 2018

I did just think of one possible thing that may be involved in this. On the 2 nodes that are currently not being alloc'd on, I did perform a node-drain on each of them over this past weekend to so I could do some maintenance on them. Once maintenance was finished, I turned off node-drain.

Current output of nomad node-status:

> nomad node-status
ID        DC          Name                        Class   Drain  Status
e84f8660  consul-ams  ctl-eu-nbg-01.zerotier.com  <none>  false  ready
e654c48c  consul-ams  ctl-eu-rbx-04.zerotier.com  <none>  false  ready
57044993  consul-ams  ctl-eu-rbx-03.zerotier.com  <none>  false  ready
33568dcc  consul-ams  ctl-eu-rbx-02.zerotier.com  <none>  false  ready
84575843  consul-ams  ctl-eu-fsn-02.zerotier.com  <none>  false  ready
49f9d7bd  consul-ams  ctl-eu-rbx-01.zerotier.com  <none>  false  ready
e6dc8f1b  consul-ams  ctl-eu-fsn-01.zerotier.com  <none>  false  ready

nodes e84f8660 and 84575843 are the two that are sitting empty while nomad says resources are exhausted.

> nomad node-status e84f8660
ID      = e84f8660
Name    = ctl-eu-nbg-01.zerotier.com
Class   = <none>
DC      = consul-ams
Drain   = false
Status  = ready
Drivers = exec,raw_exec
Uptime  = 1h2m17s

Allocated Resources
CPU          Memory      Disk         IOPS
0/31200 MHz  0 B/31 GiB  0 B/1.3 TiB  0/0

Allocation Resource Utilization
CPU          Memory
0/31200 MHz  0 B/31 GiB

Host Resource Utilization
CPU            Memory          Disk
388/31200 MHz  940 MiB/31 GiB  3.3 GiB/1.3 TiB

Allocations
No allocations placed
> nomad node-status 84575843
ID      = 84575843
Name    = ctl-eu-fsn-02.zerotier.com
Class   = <none>
DC      = consul-ams
Drain   = false
Status  = ready
Drivers = exec,raw_exec
Uptime  = 3452h18m4s

Allocated Resources
CPU          Memory      Disk         IOPS
0/33600 MHz  0 B/31 GiB  0 B/1.8 TiB  0/0

Allocation Resource Utilization
CPU          Memory
0/33600 MHz  0 B/31 GiB

Host Resource Utilization
CPU            Memory          Disk
124/33600 MHz  2.7 GiB/31 GiB  3.6 GiB/1.9 TiB

Allocations
No allocations placed

@vincenthuynh
Copy link

Can you try running: nomad node eligibility <node>
https://www.nomadproject.io/docs/commands/node/eligibility.html

@glimberg
Copy link
Author

Ahh interesting. Must have missed the release note about that flag! Enabling eligibility on those 2 nodes fixed the issue. Thanks!

@glimberg
Copy link
Author

Some sort of feedback in the Web UI about eligibility would be really nice. There's nothing in there at all as of 0.8.3.

@qkate
Copy link
Contributor

qkate commented May 29, 2018

@glimberg Indeed, thanks for the feedback! We'll add it to our internal backlog.

@glimberg
Copy link
Author

Also there may be a bug in disabling drain. According to the docs page for the eligibility command, disabling drain should restore eligibility. It didn't in this case.

@DingoEatingFuzz
Copy link
Contributor

Some sort of feedback in the Web UI about eligibility would be really nice. There's nothing in there at all as of 0.8.3.

Good news, @glimberg! I've been working on just that. PR now open (with screenshots): #4353

@qkate
Copy link
Contributor

qkate commented May 30, 2018

@glimberg for the suspected disable drain bug, would you mind opening a separate issue for that? Thank you!

@glimberg
Copy link
Author

@qkate #4356

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants