Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck Plugin #112

Closed
subnetmarco opened this issue Apr 4, 2015 · 11 comments
Closed

Healthcheck Plugin #112

subnetmarco opened this issue Apr 4, 2015 · 11 comments
Assignees
Labels
idea/new plugin [legacy] those issues belong to Kong Nation, since GitHub issues are reserved for bug reports.

Comments

@subnetmarco
Copy link
Member

Provide an healthcheck plugin that can do tcp or http healthchecks on one or more API.

Here is a few problems:

  • Healthchecks need to run on one machine per cluster, otherwise a 200-nodes Kong cluster will DDOS the final API.
  • This means that somehow Kong needs to be aware of the cluster, and elect a master healthcheck node that will run the healthchecks.
  • If this node for some reason gets shut down, the cluster needs to be aware of that and elect a new master.
  • This also means that changing the healthcheck configuration on one node, needs to be communicated to the master node that is unaware of the change.
@subnetmarco subnetmarco added the idea/new plugin [legacy] those issues belong to Kong Nation, since GitHub issues are reserved for bug reports. label Apr 4, 2015
@subnetmarco subnetmarco self-assigned this Apr 4, 2015
@subnetmarco subnetmarco added this to the 0.2.0 milestone Apr 4, 2015
@subnetmarco
Copy link
Member Author

The same problem will happen with a billing plugin: only one node needs to execute the billing functionality.

@thibaultcha
Copy link
Member

What about pinging from a random node every time?

@subnetmarco
Copy link
Member Author

Since each node is not aware of each other, there is no coordination mechanism between nodes. This is because we decided to keep cluster awareness off from this early releases, but to have a better system we would need to implement some sort of cluster awareness at one point. Usually this can be done with Apache Zookeeper, or Serfdom, or consul.io. Once a plugin is being installed and an async timer is being started to do the health checks, it will run by default on every node.

The first implementation of the health check can run on every node, but we need to address cluster awareness sooner than later to fix this and other problems.

For example, using features like https://serfdom.io/intro/getting-started/user-events.html will also help us implementing proper invalidations #15 (as opposed to having intermediate Cassandra tables for coordination).

@subnetmarco subnetmarco modified the milestones: 0.3.0, 0.2.0 Apr 22, 2015
@subnetmarco subnetmarco modified the milestones: 0.3.1, 0.3.0 May 28, 2015
@thibaultcha thibaultcha removed this from the 0.3.1 milestone Jun 8, 2015
@ahmadnassri ahmadnassri assigned Tieske and unassigned subnetmarco Jan 21, 2016
@ahmadnassri ahmadnassri added the BC label May 13, 2016
@hutchic
Copy link
Contributor

hutchic commented May 30, 2016

Personally I'd close this and let the service discovery tool of the user's choosing do the health checks unless there's a use case for health checks that isn't closely related to service discovery I'm unaware of

https://www.consul.io/docs/agent/http/health.html
https://github.com/airbnb/nerve
https://github.com/Netflix/eureka/
http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_ht_zookeeper.html

@Tieske
Copy link
Member

Tieske commented Dec 29, 2016

I think all of this is too complex. Architecturally we need to treat each node as a single system whenever possible. No intra-node communications if we can prevent it.

With the implementation of the balancer_by_lua directives we already have health info available based on the request results. See this code

This uses the get_last_failure method.

This obviously uses 'passive' monitoring (based on failed actual requests) as opposed to the initial post above suggesting 'active' monitoring by firing explicit probe-requests.

@shaowin16
Copy link

Where is the plugin's url?
Can you attach it?

@Tieske
Copy link
Member

Tieske commented Feb 10, 2017

@shaowin16 it hasn't been added yet. Probably for 0.11

@r-alekseev
Copy link
Contributor

r-alekseev commented Oct 20, 2017

Healthchecks need to run on one machine per cluster, otherwise a 200-nodes Kong cluster will DDOS the final API.
This means that somehow Kong needs to be aware of the cluster, and elect a master healthcheck node that will run the healthchecks.

I think all of this is too complex. Architecturally we need to treat each node as a single system whenever possible. No intra-node communications if we can prevent it.

To avoid DDOS and intra-node communications, we can invent a setting that represents a chance of a node to send a healthcheck at the end of a time interval.

f.e.:
interval = 1 second
chance = 1/200

Each node every second generate a random number from 1 to 200.
Some bingo node(s) will send a healthcheck.

@Tieske
Copy link
Member

Tieske commented Oct 21, 2017

Even in that case, if the healthcheck results in a state change, we're back to cross-cluster communications.

As an update, healthchecks is actively being worked on at the moment.

@coopr
Copy link
Contributor

coopr commented Dec 6, 2017

@Tieske
Copy link
Member

Tieske commented Jan 18, 2018

since 0.12 is released, closing this.

@Tieske Tieske closed this as completed Jan 18, 2018
gszr pushed a commit that referenced this issue Jun 10, 2021
bungle added a commit that referenced this issue Jun 21, 2023
### Summary

#### bug fixes
- **\*:** fix typos and add error check for new_of/dup_of ([#2](fffonion/lua-resty-openssl#2)) [aa6ad47](fffonion/lua-resty-openssl@aa6ad47)

#### features
- **tests:** add performance test ([#112](fffonion/lua-resty-openssl#112)) [100b4e4](fffonion/lua-resty-openssl@100b4e4)
- **x509.store:** add store:check_revocation and add flag to skip check CRL for store:add ([#1](fffonion/lua-resty-openssl#1)) [1a5a4c8](fffonion/lua-resty-openssl@1a5a4c8)

Signed-off-by: Aapo Talvensaari <[email protected]>
bungle added a commit that referenced this issue Jun 22, 2023
### Summary

#### bug fixes
- **\*:** fix typos and add error check for new_of/dup_of ([#2](fffonion/lua-resty-openssl#2)) [aa6ad47](fffonion/lua-resty-openssl@aa6ad47)

#### features
- **tests:** add performance test ([#112](fffonion/lua-resty-openssl#112)) [100b4e4](fffonion/lua-resty-openssl@100b4e4)
- **x509.store:** add store:check_revocation and add flag to skip check CRL for store:add ([#1](fffonion/lua-resty-openssl#1)) [1a5a4c8](fffonion/lua-resty-openssl@1a5a4c8)

Signed-off-by: Aapo Talvensaari <[email protected]>
kikito pushed a commit that referenced this issue Apr 23, 2024
### Summary

#### [0.13.0] - 2024-03-28
##### bug fixes
- **autossl:** log the errors on the list certificates request ([#110](fffonion/lua-resty-acme#110)) [6c9760f](fffonion/lua-resty-acme@6c9760f)

#### features
- **autossl:** add option to delete none whitelisted domains in certificate renewal ([#112](fffonion/lua-resty-acme#112)) [1bbf39c](fffonion/lua-resty-acme@1bbf39c)

Signed-off-by: Aapo Talvensaari <[email protected]>
bungle added a commit that referenced this issue Apr 23, 2024
### Summary

#### [0.13.0] - 2024-03-28
##### bug fixes
- **autossl:** log the errors on the list certificates request ([#110](fffonion/lua-resty-acme#110)) [6c9760f](fffonion/lua-resty-acme@6c9760f)

#### features
- **autossl:** add option to delete none whitelisted domains in certificate renewal ([#112](fffonion/lua-resty-acme#112)) [1bbf39c](fffonion/lua-resty-acme@1bbf39c)

Signed-off-by: Aapo Talvensaari <[email protected]>
locao pushed a commit that referenced this issue Apr 24, 2024
- **autossl:** log the errors on the list certificates request ([#110](fffonion/lua-resty-acme#110)) [6c9760f](fffonion/lua-resty-acme@6c9760f)

- **autossl:** add option to delete none whitelisted domains in certificate renewal ([#112](fffonion/lua-resty-acme#112)) [1bbf39c](fffonion/lua-resty-acme@1bbf39c)

Cherry-picked from #12909

KAG-4330

Signed-off-by: Aapo Talvensaari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea/new plugin [legacy] those issues belong to Kong Nation, since GitHub issues are reserved for bug reports.
Projects
None yet
Development

No branches or pull requests

8 participants