Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy does not remove endpoint after it's been removed from the Kubernetes Endpoint object when health checking is enabled #603

Closed
alexbrand opened this issue Aug 3, 2018 · 4 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@alexbrand
Copy link
Contributor

When active health checking is enabled, Envoy does not seem to remove endpoints from the ClusterLoadAssignment when endpoints are removed from the Kubernetes Endpoints object.

There is an Envoy Cluster option that I suspect is related to this:

drain_connections_on_host_removal
(bool) If this cluster uses EDS or STRICT_DNS to configure its hosts, immediately drain connections from any hosts that are removed from service discovery.

This only affects behavior for hosts that are being actively health checked. If this flag is not set to true, Envoy will wait until the hosts fail active health checking before removing it from the cluster.

(https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto)

xref: projectcontour/gimbal#208

Steps to reproduce:

  1. Create a deployment:
k run kuard --image=gcr.io/kuar-demo/kuard-amd64:1 --expose --port 8080
  1. Create an ingressroute:
apiVersion: contour.heptio.com/v1beta1
kind: IngressRoute
metadata:
  name: kuard
spec:
  routes:
  - match: /
    services:
    - healthCheck:
        healthyThresholdCount: 5
        intervalSeconds: 5
        path: /
        timeoutSeconds: 2
        unhealthyThresholdCount: 3
      name: kuard
      port: 8080
  virtualhost:
    fqdn: example.com
  1. Verify you can reach kuard via the ingressroute

  2. Edit the kuard pod, and remove the run: kuard label to remove this pod from the Endpoints object. This will trigger the creation of a new pod, and the Kubernetes Endpoints object will be updated to contain the address of the new pod.

Expected result: Envoy removes the other endpoint from the ClusterLoadAssignment

Actual result: Envoy maintains the old endpoint in the ClusterLoadAssignment, and sends traffic to both the old pod and the new one.

@alexbrand alexbrand added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 3, 2018
@alexbrand alexbrand added this to the 0.6.0 milestone Aug 6, 2018
@alexbrand
Copy link
Contributor Author

alexbrand commented Aug 6, 2018

Spent time looking into this issue. I am able to reproduce this consistently. I looked into setting drain_connections_on_host_removal to true, but it doesn't seem to have an effect. Starting to wonder if I am thinking about this wrong and I am missing something.

@rosskukulinski
Copy link
Contributor

Ok, so drain_connections_on_host_removal was added in Envoy 1.7. I think we should enable this flag once we update to Envoy 1.7 (which is currently scheduled for Contour 0.7).

@rosskukulinski rosskukulinski modified the milestones: 0.6.0, 0.7.0 Aug 13, 2018
@davecheney
Copy link
Contributor

davecheney commented Aug 13, 2018 via email

@rosskukulinski
Copy link
Contributor

@alexbrand would you have some time to dig into adding the drain_connections_on_host_removal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

4 participants