Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration updates stop on Azure Kubernetes Service (AKS) #1039

Closed
HoveringHalibut opened this issue Dec 17, 2018 · 12 comments
Closed

Configuration updates stop on Azure Kubernetes Service (AKS) #1039

HoveringHalibut opened this issue Dec 17, 2018 · 12 comments
Assignees
Milestone

Comments

@HoveringHalibut
Copy link

Describe the bug
After deploying Ambassador on an AKS cluster, service configuration changes stop updating after 5-10 minutes for the Ambassador service.

To Reproduce

  1. Deploy new AKS cluster (Tested with RBAC enabled and disabled)
  2. Deploy Ambassador per https://www.getambassador.io/user-guide/getting-started
  3. Wait 10 minutes
  4. Deploy httpbin per the getting-started doc
  5. Check if routes have updated via Ambassador diagnostics

Expected behavior
Service configurations continue to update.

Versions (please complete the following information):

  • Ambassador: 0.40.2
  • Kubernetes environment: AKS
  • Version 1.11.5
@cbenien
Copy link

cbenien commented Dec 18, 2018

+1 I was just about to write a bug report myself. I have the exact same issue in the exact same configuration (v0.40.2 and AKS with Kubernetes 1.11.5)

I even built the Ambassador Docker image myself and added a few extra log messages in kubewatch.py. It appears that the events from the Kubernetes API server don't reach kubewatch. It can't be a permission issue because it works for a few minutes after redeploying Ambassador.

@richarddli
Copy link
Contributor

We've seen this on v1.11.13, v1.11.14, and 1.9.11, in both RBAC and non-RBAC mode. This appears to be an issue with clusters deployed more recently, i.e., clusters deployed in September do not have this issue.

We're pinging AKS engineering on this. If others on this thread can open up AKS support tickets on this issue that would be helpful. This issue is easily reproducible on AKS, and does not seem to exist on other hosted Kubernetes providers.

@iNoahNothing
Copy link
Contributor

iNoahNothing commented Dec 18, 2018

This slack bot that watches the kube-apiserver does not appear to have any issue receiving events. https://github.com/bitnami-labs/kubewatch

As someone above reported, Ambassador's kubewatch does not appear to be receiving events on Azure after a couple of minute with no errors.

@HoveringHalibut
Copy link
Author

@richarddli Did you check if all of those clusters are running on the Moby engine? Azure/acs-engine#3896

The following will output the docker engine version:
kubectl describe nodes | grep 'Container Runtime Version'
3.0.1 indicates the Moby engine.

Per here, Moby went GA on all new node deployments on December 5th.

I opened a support case with Microsoft on the issue.

@iNoahNothing
Copy link
Contributor

iNoahNothing commented Dec 18, 2018

@HoveringHalibut Interesting find.

I just checked the other environments I've run Ambassador on to see the Docker version they're running.
AKS: 3.0.1 (Fork of 18.06)
GKE: 17.3.2
EKS: 18.6.1
Docker: 18.9.0
Minikube: 17.12.1-ce

@iNoahNothing
Copy link
Contributor

Steps to reproduce:

  1. Deploy a cluster
    a. RBAC or non-RBAC
    b. I have reproduced it on v1.9.11, 1.11.13 and 1.11.14
  2. Extract the yaml files attached AKS_deployment.zip
  3. Apply the yaml
    a. kubectl apply -f ambassador-deploy.yaml
    b. kubectl apply -f ambassador-service.yaml
    c. kubectl apply -f qotm/qotm-deploy.yaml
    d. kubectl apply -f qotm/qotm1.yaml
    This will deploy ambassador and create a route to a service running in the cluster
  4. Get the ip of the load balancer ambassador service (ambassador-external-ip)
  5. Test the mapping with curl -v http:///qotm/
  6. Wait 5-10 minutes
  7. Apply a mapping for the url http:// /qotm2/
    a. kubectl apply -f qotm/qotm2.yaml
  8. Test the mapping with curl and notice that Ambassador does not notice the mapping.

@richardbolt
Copy link

We are experiencing the same thing and had previously noticed #928 and went and build a custom Ambassador with an updated kube-client as suggested in e5dcd66 into the 0.40.2 code and the issue persisted.

@xydinesh
Copy link

I've run into this issue too. However deleting all ambassador pods renews ambassabor routing table when they recreate. it worked and stoped updating after few minutes.

kubectl delete pods -l service=ambassador

@richarddli
Copy link
Contributor

Just a quick update. It's not Moby, but working with the Azure engineering team we believe we are zeroing in on the root cause. We hope to provide a more detailed update soon.

@iNoahNothing
Copy link
Contributor

iNoahNothing commented Dec 19, 2018

The underlying reason for this issue is Ambassador talks to the kube-apiserver via a series of proxies. It seems that at some point, one of these proxies is dropping the connection with the python-client Ambassador uses.

The fix we are evaluating is taking advantage of the mutating webhook admissions controller feature AKS recently implemented to bypass this series of proxies with the go-client.

We will provide more details as progress is made.

@HoveringHalibut
Copy link
Author

@nbkrause and @richarddli Thanks for the update and work on this issue. I'm continuing to push on my support case with this issue. A problem with a proxy timeout makes me nervous about its affect on other services with similar hooks as Ambassador.

@kflynn kflynn added this to the 0.50.0 GA milestone Dec 20, 2018
@kflynn kflynn self-assigned this Dec 31, 2018
@richarddli
Copy link
Contributor

Testing seems to have shown that #1087 fixes this issue. Targeting for rc4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants