-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907
Comments
Hi KIRY4, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
@KIRY4 Hey, I'm facing almost identical issue and after 2 days of troubleshooting I managed to establish that internal load balancer is unhealthy. It would seems that healthprobes are misconfigured, but I'm still unclear how to fix that part. Any and all input from AKS team would be appriciated on this. |
Hi @PAKalucki!! How did you get this diagram? I want to generate same diagram from my side... |
@KIRY4 In Azure Portal go to Load Balancing and find your load balancer created by AKS. In my case it was called "kubernetes-internal". Then go to insights (you might need to enable insights in aks). |
@PAKalucki thank you! I have same issue. |
@KIRY4 I have found workaround in the meantime. Go to Load balancer > Health probes. If you have probes for TCP protocol only make sure they point to proper ports (you can find them in AKS, nginx ingress service). If you have HTTP/HTTPS protocol probes make sure they use proper ports from AKS and that path is /healthz. The rules I had in there created by AKS were invalid. In case anyone from AKS team stumbles onto this, it's only a workaround. Please propose solution on how to solve this misconfiguration on AKS or Helm level so we don't have to change this manualy every time. |
It seems the same or similar as this solved issue on the Usage of the flag I'm not 100% sure if this is just a nicer workaround or the fix, but I'll leave that discussion up to the AKS team. |
@aristosvo Thank you
|
In my case |
We hit this same issue today, something seems to have changed in the configuration of the health probes for new AKS installations between February and now. We have several AKS instances (most recent install in Feb) and found they all had TCP health probes by default, only a recently added one had HTTP ones. Service health was showing "degraded" when the new AKS was first installed. When we changed to TCP to match our other installations traffic flowed to our ingresses as it had with previous installs and the service health event was resolved. |
I was able to reproduce the behavior by using the following yaml file for testing and two AKS cluster, one with Kubernetes version 1.21, the other with 1.22.
Then I removed appProtocol: https from the yaml and created the same service with another name and I concluded it creates it with Protocol TCP: About appProtocol / health probe: I think this is the expected behavior (when you have appProtocol: https to use for health probe https protocol) and 1.22 just adopts this. The reason this happens is because 1.22 has by default Cloud Controller Manager (https://docs.microsoft.com/en-us/azure/aks/out-of-tree). To confirm that, I enabled CCM (there is an issue in the documentation, correct command is “az aks update -n aks -g myResourceGroup --aks-custom-headers EnableCloudControllerManager=True”) on my 1.21 AKS cluster and applied the service yaml which includes appProtocol: https (just with a different name) and confirmed the above. This confirms what happens is the expected behavior and in Kubernetes version 1.22 "appProtocol" is reflected in the health probe of the Load Balancer. |
A documentation PR has been merged to add |
Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days. |
Sorry to see the appProtocol support inside cloud provider has broken ingress-nginx for AKS clusters >=1.22. The issue was caused by two reasons:
AKS is rolling out a fix to keep backward compatibility (ETA is 48 hours from now) with fix cloud-provider-azure#1626. After this fix, the default probe protocol would only be changed to appProtocol for clusters >=1.24 (refer docs for more details). Before the rollout of this fix, please upgrade the existing ingress-nginx with And for newly created helm releases, you could also set |
What happened:
Hello everyone!! I have following issue/probably misconfiguration but I can't figure out what exactly I'm doing wrong.
From following VM's: vm-ubuntu-aks-mgmt-01 - 10.3.0.4, vm-ilb-01 - 10.3.1.6 I'm getting following.
curl 10.3.1.100 -k -v
Trying 10.3.1.100:80...
TCP_NODELAY set
curl http://helloworld.spg.internal -k -v
Trying 10.3.1.100:80...
TCP_NODELAY set
nslookup helloworld.spg.internal
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
Name: helloworld.spg.internal
Address: 10.3.1.100
But if I jump INSIDE pod for example: kubectl exec -it nginx-ingress-ingress-nginx-controller-7475cb6b9f-x7nf2 -n ingress-basic sh. And then curl from inside pod:
I'm getting SUCESS. Status code 200 OK and html page.
What you expected to happen:
curl http://helloworld.spg.internal - success (status code: 200 OK) from everywhere in vnet I mean from all VM's mentioned below (vm-ubuntu-aks-mgmt-kmazurov-01 - 10.3.0.4, vm-ilb-01 - 10.3.1.6)
How to reproduce it (as minimally and precisely as possible):
For testing/learning purposes I deployed private AKS in pre-created VNET.
Here is VNET configuration (no NSG configured since that playground for learning purposes):
vnet-cloud-01: 10.3.0.0/22
Subnets:
snet-cloud-mgmt-01: 10.3.0.0/24 - here I deployed mgmt VM (vm-ubuntu-aks-mgmt-01 - 10.3.0.4) with kubectl, helm, az cli to access cluster;
snet-cloud-priv-aks-01: 10.3.1.0/24 - here I deployed private AKS + one VM just for testing purposes (vm-ilb-01 - 10.3.1.6);
GatewaySubnet: 10.3.2.0/24 - I'm using it for P2S VPN connection to vnet.
Following this article: https://docs.microsoft.com/en-us/azure/aks/ingress-internal-ip I deployed Nginx Ingress controller with internal Load Balancer IP 10.3.1.100. Service created successfully and IP was assigned correctly.
Also I attached private DNS zone (spg.internal) and attach it to vnet-cloud-01 also add following A record:
Name | Type | TTL | Value |
helloworld | A | 10 | 10.3.1.100|
Anything else we need to know?:
Environment:
kubectl version
): v1.22.6 - cluster, v1.23.6 - kubectl.https://docs.microsoft.com/en-us/azure/aks/ingress-internal-ip - workload from this article.
The text was updated successfully, but these errors were encountered: