-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm upgrade controller from v0.51 to v1.4.0 caused 10.0.0.2:0: invalid port while connecting to upstream error #9141
Comments
@angelsk: This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retitle helm upgrade controller from v0.51 to v1.4.0 caused 10.0.0.2:0: invalid port while connecting to upstream error |
We also see this after upgrading from 1.3.0 to 1.4.0 (helm chart 4.2.1 to 4.3.0). It seems to only affect one of our ingresses (out of several dozen) and we think it also causes periodic "504" 5 second timeout errors to be returned. rollback to 4.2.1 resolved both the error and the timeouts. |
They are one and the same. GCP just wraps the controller pod logs and tags them for easy reference. I'll do some more tests today to see if I can track the IP addresses - it's not always 10.0.0.2:0, we had 10.0.0.14:0 etc last time too. Because we have 3 endpoints it maps over several pods. Visiting the URLs in the browser gives a 504 all the time. |
@angelsk "They are one and the same" has different implications for different roles here. For the role of people wanting to help solve your problem voluntarily, on their free unpaid time, it is practical to have data to be analyzed, posted here So there are questions asked in the issue template that basically populates the issue here with data from the state of the cluster and the events. Kindly help and post the data that is asked in a new issue template. |
Sorry, I didn't understand some of the questions in the issue template - I tried to answer what I could. I do appreciate the assistance, I'm trying to be reciprocally helpful. I'll do what I can and update the issue itself with the information. Sorry |
@longwuyuan I've updated the ticket with the proper output. Just grabbing the ingress controller pods logs - the IPs should match up in the ticket and logs as these are the most recent 504 gateway timeout on URLs. This is logs in one of the pods (sanitised), from our uptime checker and an API request I added to check:
|
If it helps, this is the output of And I did some mapping of the IPs to pods from the above: So that's weird.... |
Ok so running the backend config on the old pod produces a clear difference - I hope this investigation helps. I attach both configurations (IPs of a couple of the pods changed - but you can see where there's a huge chunk of config added for our workers which do not have a domain map for them). |
Hi, @angelsk could you please verify, if it will work, when you change named targetPort to port number in service? |
Hi @tombokombo - we tried the following to no avail changing the named targetPort to a port number in the service I am happy to try other suggestions - I did wonder if there was a config update for the values that we would need to make to align with the new controller. But didn't find anything obvious in the documentation. |
@angelsk could you please provide output of |
It's on my list for tomorrow! Feels like making progress - thank you!! |
I can carry the torch :) - I think we may be on the right track as it did feel like #8890 was the only remotely relevant change in our 1.3.1->1.4.0 upgrade. For us, we see
which look like: addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 100.127.2.33
conditions:
ready: true
serving: true
terminating: false
nodeName: i-abcdefgh556806c44
targetRef:
kind: Pod
name: our-service-SOMETHINGELSE-9cfd97fcf-9f45m
namespace: our-namespace
uid: 018b401f-6c25-429d-a38d-72326aa8dc9e
zone: us-west-2a
kind: EndpointSlice
metadata:
labels:
app.kubernetes.io/instance: our-service
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: our-service
app.kubernetes.io/version: 1.0.0
endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
helm.sh/chart: our-service-1.0.0
kubernetes.io/service-name: our-service
name: our-service-jln8p
namespace: our-namespace
ports: null and addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 100.119.204.56
conditions:
ready: true
serving: true
terminating: false
nodeName: i-123def03095041cba
targetRef:
kind: Pod
name: our-service-669cf7cf8d-97lbg
namespace: our-namespace
uid: 9a952cbf-d0fe-4914-b3dd-5edde06e44ba
zone: us-west-2a
- addresses:
- 100.108.188.33
conditions:
ready: true
serving: true
terminating: false
nodeName: i-1234596b55c8448ba
targetRef:
kind: Pod
name: our-service-669cf7cf8d-8njs7
namespace: our-namespace
uid: 25c035e9-9fae-4386-830d-1954e2172b8a
zone: us-west-2a
- addresses:
- 100.113.169.38
conditions:
ready: true
serving: true
terminating: false
nodeName: i-2346fa6afd2e1a6ba
targetRef:
kind: Pod
name: our-service-669cf7cf8d-rwsn5
namespace: our-namespace
uid: fba72a45-1159-4666-9e0c-de290874e52d
zone: us-west-2a
- addresses:
- 100.119.219.138
conditions:
ready: true
serving: true
terminating: false
nodeName: i-12324522ccdd93f3
targetRef:
kind: Pod
name: our-service-669cf7cf8d-cmbcb
namespace: our-namespace
uid: ffd4e4be-42f0-4312-a48d-c3a8d0189883
zone: us-west-2a
- addresses:
- 100.127.2.20
conditions:
ready: true
serving: true
terminating: false
nodeName: i-0317771f556806c44
targetRef:
kind: Pod
name: our-service-669cf7cf8d-q26mp
namespace: our-namespace
uid: 6bc546ea-2965-430a-b598-129df5bb7821
zone: us-west-2a
kind: EndpointSlice
metadata:
labels:
app.kubernetes.io/instance: our-service
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: our-service
app.kubernetes.io/version: 1.0.0
endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
helm.sh/chart: our-service-1.0.0
kubernetes.io/service-name: our-service
name: our-service-np8jr
namespace: our-namespace
ports:
- name: http
port: 80
protocol: TCP I'm now trying to figure out why we have two of them (this seems to only happen for one of our services), since it's the first one "jln8p" one seems to be breaking things. |
Can -o yaml output of svc be shared |
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/instance: our-service
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: our-service
app.kubernetes.io/version: 1.0.0
argocd.argoproj.io/instance: our-service-prod
helm.sh/chart: our-service-1.0.0
name: our-service
namespace: our-namespace
spec:
clusterIP: 100.69.202.77
clusterIPs:
- 100.69.202.77
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
port: 80
targetPort: http
selector:
app.kubernetes.io/instance: our-service
app.kubernetes.io/name: our-service and.. I think the issue might be that we have two deployments (one of which isn't a service that listens on any ports) that are matching "selector" listed here. |
@vitaliyf the problem is probably coming from |
Also kubectl get ep -o yaml output to see the exact difference |
Right, I think it's our fault that we have one Helm chart with two Deployments, only one of which exposes any ports. That causes two EndpointSlice to exist (one with null port). Here's our Endpoint that does seem to have only the correct 5 pods that expose ports. The "wrong" Deployment does not list any ports (it's a background process that we want to run alongside the REST API service). apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2022-10-11T15:07:23Z"
creationTimestamp: "2022-10-11T15:07:18Z"
labels:
app.kubernetes.io/instance: our-service
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: our-service
app.kubernetes.io/version: 1.0.0
helm.sh/chart: our-service-1.0.0
name: our-service
namespace: our-namespace
resourceVersion: "129768518"
uid: 16149479-c5f2-4b23-a99c-5d8cbae7e2e5
subsets:
- addresses:
- ip: 100.108.188.33
nodeName: i-07c9896b55c8448b1
targetRef:
kind: Pod
name: our-service-669cf7cf8d-8njs7
namespace: our-namespace
uid: 25c035e9-9fae-4386-830d-1954e2172b8a
- ip: 100.113.169.38
nodeName: i-04a0fa6afd2e1a665
targetRef:
kind: Pod
name: our-service-669cf7cf8d-rwsn5
namespace: our-namespace
uid: fba72a45-1159-4666-9e0c-de290874e52d
- ip: 100.119.204.56
nodeName: i-03c33f03095041f48
targetRef:
kind: Pod
name: our-service-669cf7cf8d-97lbg
namespace: our-namespace
uid: 9a952cbf-d0fe-4914-b3dd-5edde06e44ba
- ip: 100.119.219.138
nodeName: i-0d826aa22ccdd93f3
targetRef:
kind: Pod
name: our-service-669cf7cf8d-cmbcb
namespace: our-namespace
uid: ffd4e4be-42f0-4312-a48d-c3a8d0189883
- ip: 100.127.2.20
nodeName: i-0317771f556806c44
targetRef:
kind: Pod
name: our-service-669cf7cf8d-q26mp
namespace: our-namespace
uid: 6bc546ea-2965-430a-b598-129df5bb7821
ports:
- name: http
port: 80
protocol: TCP |
I'm able to reproduce. One deployment exposing named port and other deployment with just port number. Service is targeting named port.
^^ unset is the problem |
I feel this is an issue with ingress-nginx code and needs to be handled appropriately like endpoints used to like in 1.3.1. |
@bmv126 yes, i'm going to fix it. |
I see the PR fix cites misconfigured ports. But the pods it was trying to use for this shouldn't have been included because they are internal service workers. Is there a way to define those so the slice thing doesn't pick them up? Either way, yay for speedy fix and thanks all! |
@tombokombo how frequently are releases tagged? Is there a config fix I can apply in the meantime? |
@angelsk please share your application service. endpointslices, deployment and ingress
Patch should fix your problem as well. According backends from controller, that your already provided, you have two valid endpoints with older controller and there is bunch of endpoints with port equal to 0 with v1.4. Endpoints with zero port was a bug, they will disappear. |
@tombokombo YAML dump incoming! If there's any way I can get this working with 1.4 before the patch (As I don't know how long the release process is) then any help would be gladly accepted. Otherwise I might just try with whichever Helm chart version has the 1.3.x controller in it - as that was pre-slice :)
NOTE: Unset here is intentional - those pods are NOT mapped to ports or domains.
|
Have successfully upgraded (I think) to Helm chart 4.2.5 and controller 1.3.1. So waiting on bug fix or workaround for the 0 port issue :) |
@angelsk at first you need to get rid of |
@tombokombo I inherited this project; we don't have the bandwidth to redesign so I think I'll just wait for the fix. Thanks anyways. I've added your notes to the backlog |
What happened:
I’m trying to upgrade from
helm-chart 3.41.0 with ingress-controller 0.51.0
tohelm-chart 4.3.0 with ingress-controller 1.4.0
onkubernetes 1.21.14
in GCPAnd I get an error in the
lua/balancer.lua
file on line 348 about ports.[error] 31#31: *2450 [lua] balancer.lua:348: balance(): error while setting current upstream peer 10.0.0.2:0: invalid port while connecting to upstream, client: 10.x.x.x, server: OUR_URL, request: "GET / HTTP/1.1", host: "OUR_URL"
What you expected to happen:
I expected it to just work :)
NGINX Ingress controller version (exec into the pod and run
nginx-ingress-controller --version
.):Kubernetes version (use
kubectl version
):1.21.14
Environment:
Kernel (e.g.
uname -a
):Linux ingress-nginx-controller-5bf7cf4684-v5hg6 5.4.202+ #1 SMP Sat Jul 16 10:06:38 PDT 2022 x86_64 Linux
Install tools:
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
Basic cluster related info:
kubectl version
: v1.21.14-gke.2700kubectl get nodes -o wide
: 3 nodes, Container-Optimized OS from Google, Kernel version 5.4.202+How was the ingress-nginx-controller installed:
helm ls -A | grep -i ingress
helm -n <ingresscontrollernamepspace> get values <helmreleasename>
This is for our staging instance
HELM_VERSION=3.9.0
This is how we install it in the action
Response headers file:
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
kubectl -n <appnnamespace> get all,ing -o wide
kubectl -n <appnamespace> describe ing <ingressname>
If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
Others:
kubectl describe ...
of any custom configmap(s) created and in useThese are the YAML files from GCP for both current (3.41) and upgrade attempt (4.3) for the 4 "resources" - the ingress-controller pod; the 2 services and the ingress app. They have been sanitised for potentially private data.
Archive.zip
How to reproduce this issue:
Anything else we need to know:
I can provide all the other config we have - but it's just a pretty basic - here are our pods and the ports and domain names.
Works perfectly with the old version, but need to be able to upgrade to the new APIs as we want to upgrade our k8 cluster past 1.21.
Tried asking in the Slack channel but no-one has any information.
Couldn't find anything pertinent in the documentation re: upgrading from Helm chart 3 to 4.
The text was updated successfully, but these errors were encountered: