Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service is flapping if output from script check changes #2754

Closed
joshuaspence opened this issue Feb 17, 2017 · 6 comments
Closed

Service is flapping if output from script check changes #2754

joshuaspence opened this issue Feb 17, 2017 · 6 comments

Comments

@joshuaspence
Copy link

I'm not sure if this is intentional, but under Consul 0.7.1 it seems that a service event is generated if the standard output from a script check changes. I have a varnish service defined using the following configuration:

{
  "service": {
    "checks": [
      {
        "interval": "5s",
        "tcp": "0.0.0.0:80"
      },
      {
        "http": "http://127.0.0.1:80/ping",
        "interval": "10s"
      },
      {
        "interval": "10s",
        "script": "sudo varnishadm ping"
      }
    ],
    "enableTagOverride": false,
    "id": "varnish",
    "name": "varnish",
    "port": 80,
    "tags": []
  }
}

Running consul watch -service=varnish -type=service "date; curl --silent --show-error http://127.0.0.1:8500/v1/health/service/varnish?pretty; echo ----------", I see the child process executed every time that the output from varnishadm ping changes:

Thu Feb 16 22:15:21 EST 2017
[
    {
        "Node": {
            "Node": "i-0690e9e21b268a23b",
            "Address": "10.16.132.23",
            "TaggedAddresses": {
                "lan": "10.16.132.23",
                "wan": "10.16.132.23"
            },
            "CreateIndex": 5286423,
            "ModifyIndex": 5332227
        },
        "Service": {
            "ID": "varnish",
            "Service": "varnish",
            "Tags": [],
            "Address": "",
            "Port": 80,
            "EnableTagOverride": false,
            "CreateIndex": 5286425,
            "ModifyIndex": 5332227
        },
        "Checks": [
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 5286427,
                "ModifyIndex": 5286427
            },
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "service:varnish:1",
                "Name": "Service 'varnish' check",
                "Status": "passing",
                "Notes": "",
                "Output": "TCP connect 0.0.0.0:80: Success",
                "ServiceID": "varnish",
                "ServiceName": "varnish",
                "CreateIndex": 5286425,
                "ModifyIndex": 5332225
            },
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "service:varnish:2",
                "Name": "Service 'varnish' check",
                "Status": "passing",
                "Notes": "",
                "Output": "HTTP GET http://127.0.0.1:80/ping: 200 OK Output: ",
                "ServiceID": "varnish",
                "ServiceName": "varnish",
                "CreateIndex": 5286425,
                "ModifyIndex": 5332225
            },
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "service:varnish:3",
                "Name": "Service 'varnish' check",
                "Status": "passing",
                "Notes": "",
                "Output": "PONG 1487301271 1.0\n",
                "ServiceID": "varnish",
                "ServiceName": "varnish",
                "CreateIndex": 5332225,
                "ModifyIndex": 5332227
            }
        ]
    }
]
----------




Thu Feb 16 22:21:24 EST 2017
[
    {
        "Node": {
            "Node": "i-0690e9e21b268a23b",
            "Address": "10.16.132.23",
            "TaggedAddresses": {
                "lan": "10.16.132.23",
                "wan": "10.16.132.23"
            },
            "CreateIndex": 5286423,
            "ModifyIndex": 5332361
        },
        "Service": {
            "ID": "varnish",
            "Service": "varnish",
            "Tags": [],
            "Address": "",
            "Port": 80,
            "EnableTagOverride": false,
            "CreateIndex": 5286425,
            "ModifyIndex": 5332361
        },
        "Checks": [
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 5286427,
                "ModifyIndex": 5286427
            },
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "service:varnish:1",
                "Name": "Service 'varnish' check",
                "Status": "passing",
                "Notes": "",
                "Output": "TCP connect 0.0.0.0:80: Success",
                "ServiceID": "varnish",
                "ServiceName": "varnish",
                "CreateIndex": 5286425,
                "ModifyIndex": 5332225
            },
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "service:varnish:2",
                "Name": "Service 'varnish' check",
                "Status": "passing",
                "Notes": "",
                "Output": "HTTP GET http://127.0.0.1:80/ping: 200 OK Output: ",
                "ServiceID": "varnish",
                "ServiceName": "varnish",
                "CreateIndex": 5286425,
                "ModifyIndex": 5332225
            },
            {
                "Node": "i-0690e9e21b268a23b",
                "CheckID": "service:varnish:3",
                "Name": "Service 'varnish' check",
                "Status": "passing",
                "Notes": "",
                "Output": "PONG 1487301683 1.0\n",
                "ServiceID": "varnish",
                "ServiceName": "varnish",
                "CreateIndex": 5332225,
                "ModifyIndex": 5332361
            }
        ]
    }
]
----------
@aprice
Copy link

aprice commented Feb 22, 2017

If it's passing each time, it's not flapping. Flapping would be cycling between healthy and unhealthy states.

If the output changes, then yes that's treated as a change, but the status is still healthy, right?

@joshuaspence
Copy link
Author

Right, but is it expected that the watch handler is fired off when there has been no state change?

@joshuaspence
Copy link
Author

Basically, I have a Consul watcher configured to start a service when its backends change, and the service is currently being restarted more often than expected.

@lord2800
Copy link

Yes, it's expected, even if it's not desired. Can you make your check output deterministic, instead?

@joshuaspence
Copy link
Author

joshuaspence commented Feb 28, 2017

I can change my check to sudo varnishadm ping >/dev/null, but that doesn't feel like the right solution. Is there a use case in which it is desirable that a change in a health check's standard output produces this result?

@slackpad
Copy link
Contributor

Hi @joshuaspence this is working as designed since the check output is part of the check record, so every time that changes it bumps the Raft index, which is what triggers the watcher which is looking for any change to the results. You definitely want to avoid check output with values that change all the time. You can also crank up https://www.consul.io/docs/agent/options.html#check_update_interval to throttle writes when the output changes but the state of the check doesn't change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants