Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processes that exit > 0 behave differently in multi region env #1694

Closed
jshaw86 opened this issue Sep 7, 2016 · 3 comments
Closed

Processes that exit > 0 behave differently in multi region env #1694

jshaw86 opened this issue Sep 7, 2016 · 3 comments

Comments

@jshaw86
Copy link

jshaw86 commented Sep 7, 2016

Nomad version

nomad 0.4.1

Operating system and Environment details

ubuntu 14.04

Issue

Have 2 regions, sending same jobspec to both regions. In region 1 when a process exit's > 0 the process doesn't reschedule. In region 2 the process reschedules.

The correct behavior is probably region 2's but not sure what the design intent is.

Reproduction steps

  1. Setup 2 regions (1 server, 1 client in each)
  2. create a job(node in my case) that process.exit(1)
  3. schedule the job in both regions with federation when scheduling to the second region
  4. tail the nomad server logs you should observe the process in region 1 will not restart but in region 2 it will restart forever (see below logs)

Nomad Server logs (if appropriate)

region 1

Sep  7 22:04:26 vagrant-ubuntu-trusty-64 nomad[1960]: client: task "event_handler_1_9_blu" for alloc "78449cb7-dd89-625a-dadd-3449516c2ea6" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:04:26 vagrant-ubuntu-trusty-64 nomad[1960]: client: Restarting task "event_handler_1_9_blu" for alloc "78449cb7-dd89-625a-dadd-3449516c2ea6" in 16.780269194s
Sep  7 22:04:26 vagrant-ubuntu-trusty-64 consul[1935]: agent: Deregistered service '_nomad-executor-78449cb7-dd89-625a-dadd-3449516c2ea6-event_handler_1_9_blu-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:04:26 vagrant-ubuntu-trusty-64 consul[1935]: agent: Deregistered service '_nomad-executor-78449cb7-dd89-625a-dadd-3449516c2ea6-event_handler_1_9_blu-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:04:43 vagrant-ubuntu-trusty-64 nomad[1960]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/78449cb7-dd89-625a-dadd-3449516c2ea6/event_handler_1_9_blu/event_handler_1_9_blu-executor.out"}
Sep  7 22:04:43 vagrant-ubuntu-trusty-64 nomad[1960]: client: Not restarting task: event_handler_1_9_blu for alloc: 78449cb7-dd89-625a-dadd-3449516c2ea6

region 2

Sep  7 22:04:26 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:04:26 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 16.780269194s
Sep  7 22:04:26 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:04:26 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:04:43 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:04:43 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:04:44 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:04:44 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 17.586248953s
Sep  7 22:04:44 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:04:44 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:02 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:05:02 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:02 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:05:02 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 23.120035258s
Sep  7 22:05:02 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:02 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:25 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:05:26 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:26 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:05:26 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 18.351979213s
Sep  7 22:05:26 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:26 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:45 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:05:45 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:45 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:05:45 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 18.106650204s
Sep  7 22:05:45 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:05:45 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:04 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:06:04 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:04 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:06:04 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 18.67173012s
Sep  7 22:06:04 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:04 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:23 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:06:23 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:24 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:06:24 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 2.413293568s
Sep  7 22:06:24 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:24 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:26 vagrant-ubuntu-trusty-64 nomad[1959]: plugin: starting plugin: /opt/nomad/nomad []string{"/opt/nomad/nomad", "executor", "/opt/nomad/data/alloc/7abf5216-6a07-60ff-026e-d57011cd7bbf/event_handler_1_9_grn/event_handler_1_9_grn-executor.out"}
Sep  7 22:06:26 vagrant-ubuntu-trusty-64 consul[1934]: agent: Synced service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:27 vagrant-ubuntu-trusty-64 nomad[1959]: client: task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" failed: Wait returned exit code 1, signal 0, and error <nil>
Sep  7 22:06:27 vagrant-ubuntu-trusty-64 nomad[1959]: client: Restarting task "event_handler_1_9_grn" for alloc "7abf5216-6a07-60ff-026e-d57011cd7bbf" in 17.764364159s
Sep  7 22:06:27 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'
Sep  7 22:06:27 vagrant-ubuntu-trusty-64 consul[1934]: agent: Deregistered service '_nomad-executor-7abf5216-6a07-60ff-026e-d57011cd7bbf-event_handler_1_9_grn-blocks-blocks-subkey-demo-36-blockId-1-urlprefix-demo-36_js-before-publish/run/before/bar'

Job file (if appropriate)

{
    "Job": {
        "Region": "vagrant-grn-2",
        "ID": "block-1",
        "Name": "block-1",
        "Type": "service",
        "Priority": 50,
        "AllAtOnce": false,
        "Datacenters": [
            "dev-vagrant-grn-2"
        ],
        "Constraints": null,
        "TaskGroups": [
            {
                "Name": "event_handler_1_9_grn",
                "Count": 1,
                "Constraints": null,
                "Tasks": [
                    {
                        "Name": "event_handler_1_9_grn",
                        "Driver": "exec",
                        "User": "",
                        "Config": {
                            "args": [
                                "--max_old_space_size=50",
                                "/usr/lib/pn-blocks/src/main.js"
                            ],
                            "command": "/usr/bin/node"
                        },
                        "Constraints": null,
                        "Env": {
                            "HOME": "/usr/lib/pn-blocks"
                        },
                        "Services": [
                            {
                                "Id": "",
                                "Name": "blocks",
                                "Tags": [
                                    "blocks",
                                    "subkey-demo-36",
                                    "blockId-1",
                                    "urlprefix-demo-36_js-before-publish/run/before/bar"
                                ],
                                "PortLabel": "http",
                                "Checks": [
                                    {
                                        "Id": "",
                                        "Name": "ping_blocks_check-event_handler_1_9_grn",
                                        "Type": "http",
                                        "Command": "",
                                        "Args": null,
                                        "Path": "/ping",
                                        "Protocol": "",
                                        "PortLabel": "",
                                        "Interval": 2000000000,
                                        "Timeout": 2000000000
                                    }
                                ]
                            }
                        ],
                        "Resources": {
                            "CPU": 100,
                            "MemoryMB": 100,
                            "DiskMB": 500,
                            "IOPS": 0,
                            "Networks": [
                                {
                                    "Public": false,
                                    "CIDR": "",
                                    "ReservedPorts": null,
                                    "DynamicPorts": [
                                        {
                                            "Label": "http",
                                            "Value": 0
                                        }
                                    ],
                                    "IP": "",
                                    "MBits": 10
                                }
                            ]
                        },
                        "Meta": {
                            "origin": "192.168.33.45",
                            "scripts": "[{\"code\": \"function (request) {\\n    console.log(request); // Log the request envelope passed\\n    return request.ok(); // Return a promise when you're done\\n}\", \"channels\": \"bar\", \"rate\": 1, \"id\": 1, \"name\": \"test/foo\", \"blockid\": 1, \"log-level\": null, \"seckey\": \"sec-c-ZTQ1YTkzMGQtMGFhZS00NmI5LTk4OGMtOTdjZmZlNjgyMmM2\", \"location\": \"js-before-publish\", \"channel-groups\": null, \"output\": \"output-0.7695113105335039\", \"pubkey\": \"demo-36\", \"subkey\": \"demo-36\"}]",
                            "subkey": "demo-36"
                        },
                        "KillTimeout": 5000000000,
                        "LogConfig": {
                            "MaxFiles": 5,
                            "MaxFileSizeMB": 10
                        },
                        "Artifacts": null
                    }
                ],
                "RestartPolicy": {
                    "Interval": 60000000000,
                    "Attempts": 2,
                    "Delay": 15000000000,
                    "Mode": "delay"
                },
                "Meta": null
            }
        ],
        "Update": {
            "Stagger": 0,
            "MaxParallel": 0
        },
        "Periodic": null,
        "Meta": null,
        "Status": "dead",
        "StatusDescription": "",
        "CreateIndex": 609,
        "ModifyIndex": 670,
        "JobModifyIndex": 609
    }
}
@dadgar
Copy link
Contributor

dadgar commented Sep 12, 2016

Hmm,

Something else must have happened. There is no logic that should make the behavior different among regions. Can you see what the nomad alloc-status shows for the various allocations?

@jshaw86
Copy link
Author

jshaw86 commented Sep 12, 2016

@dadgar

vagrant@blocks1:~$ nomad alloc-status db19f60a
ID            = db19f60a
Eval ID       = b13b6c2c
Name          = block-1.event_handler_1_4_blu[0]
Node ID       = 46a20b00
Job ID        = block-1
Client Status = failed

Task "event_handler_1_4_blu" is "dead"
Task Resources
CPU      Memory   Disk     IOPS  Addresses
100 MHz  100 MiB  500 MiB  0     http: 127.0.0.1:21010

Recent Events:
Time                   Type            Description
09/12/16 13:37:33 PDT  Not Restarting  Error was unrecoverable
09/12/16 13:37:33 PDT  Driver Failure  failed to start task 'event_handler_1_4_blu' for alloc 'db19f60a-be23-896f-fab4-1cdae9d05239': Couldn't create destination file /opt/nomad/data/alloc/db19f60a-be23-896f-fab4-1cdae9d05239/event_handler_1_4_blu/usr/bin/blocks-usage: open /opt/nomad/data/alloc/db19f60a-be23-896f-fab4-1cdae9d05239/event_handler_1_4_blu/usr/bin/blocks-usage: text file busy
09/12/16 13:37:18 PDT  Restarting      Task restarting in 15.30202629s
09/12/16 13:37:18 PDT  Terminated      Exit Code: 1
09/12/16 13:37:17 PDT  Started         Task started by client
09/12/16 13:37:17 PDT  Received        Task received by client

looks like a duplicate of #1697 closing

@jshaw86 jshaw86 closed this as completed Sep 12, 2016
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants