-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress NGINX v1.10.2 & v1.11.0 throw core dumps #11588
Comments
Cluster getting core dumps with 1.11.0 (OVH):
Cluster with zero dumps so far (self-hosted):
Sadly the cluster where I could easily access core dumps is the one that's stable. |
Setup:
Which makes the problem in our case worse, ist that it completely filled our nodes disks. We already have 9 upvotes on this one. |
Kernel version appears to affect the bug - I'm unsure of the exact cutoff but all working nodes have version 6.x and all broken have 5.x (typically 5.15) in my testing. |
Sorry to rain on this, but my cluster with kernel 6.9.7 was failing in the same way. Debian Trixie (testing), RKE2, K8s 1.30.2, containerd 1.7.17. |
Does the same happen with v1.10.2? That would help narrowing down the root cause. |
Yes, it did on my cluster. 1.10.1 OK, 1.10.2 and 1.11.0 both bad. |
Yes. |
Ok, thanks! That really helps as we "only" introduced patches to v1.10.2. |
I have the following environments: 1 (onprem): 2 (onprem, not exactly the same as 1 however no significant differences..): 3 (Azure): |
thanks, I'll discuss pulling the release on github with @Gacko but we can not remove the images on 1.10.2 and 1.11.0 in the kubernetes registry. . |
I’m seeing the same when using 1.11.0 on my hybrid ARM64/AMD64 cluster running k3s 1.29.6, kernel 6.1:
|
We updated the release notes with a warning.
|
i am in transit right now, but It is running on kind on my laptop.
|
The issue as I experienced it is that the pods remaining running and healthy, but requests through them end up crashing Nginx with a segfault. This doesn't happen on each request, but at least 50% of the time. Here is a sample of what came out of my nodes'
|
Running into this as well in my cluster, both with 1.10.2 and 1.11.0. Rolling back to 1.10.1 fixed the issue.
|
Could this be related to ssl patches applied on 1.10.2 ? |
I tried pushing just an extra 1.10.2 deployment with a different class and shared config to the cluster that's crashing, using the bug template repro service and ingress. No core dumps, so triggering this might require a more complex setup than just one ingress with TLS and curling to localhost with proxy-protocol. |
I couldn't verify it, yet. But it might be interesting to know if some of you are using TLS offloading / pure HTTP and therefore do not face that issue. |
/triage accepted |
FWIW I’m doing TLS termination inside ingress-nginx (TLS 1.2/1.3), the cluster is behind a load balancer using proxy protocol v2. Again, kernel 6.1, arm64/amd64, k3s 1.29.6 on bare metal Armbian/Debian. |
Hello, Any chance that anyone can attach even one coredump here |
I deployed an ingress with TLS via cert-manager, with ocsp enabled, and it core dump on me, as soon as I disabled it ingress worked just fine. I am going to update nginx base and remove that last change #11590 Test it in this same cluster and see if that fixes the issue. It could still be the patches and nginx version. Thank you @thomaspeitz for pointing us in the right direction. |
Apparently there is a bug on OCSP code from latest Lua or LuaJIT:
|
Yep, see the same thing in several other dumps. We need to revert the version of lua-nginx-module upgrade #11470 and put in an Issue in https://github.com/openresty/lua-nginx-module/issues
|
/reopen Didn't mean to auto close this until we confirm the new nginx build fixes the issues. |
@strongjz: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I think we found out why it passed in CI, the test was skip for a bug issue a little while ago. Adding the test back and trying again in #11606 |
Looks like we disabled the OCSP tests a while back for a bug. It seems that cfssl needs sqlite-dev from the failures. This PR turns on the OCSP e2e tests and adds sqlite-dev back into our testing. I'm going to run tests on the nginx:0.0.8 version that has the newer version of the lua nginx module and see if that catches the issue of what is causing the core dump, more so to put in an in the lua module repo. If the tests in #11606 pass, we will move forward with the revert and wait for a release of the lua module. |
We just released controller v1.11.1 & v1.10.3 with chart v4.11.1 & v4.10.3. These releases should ship a fix for this issue. |
To confirm, I have rolled out |
So a couple things, we discussed this as well at the community meeting. The e2e for OSCP is fixed, which would have caught this issue, we learned that we should stick with released version of lua-nginx-module and not commits. I have also reviewed the e2e test to make sure we are not skipping others. Thank you to @thomaspeitz for finding the root cause. And to others who helped confirm it or provide more details. We opened issue at for the lua nginx module folks to review at openresty/lua-nginx-module#2339 We apologize for causing this issue, and will continue to work on making releases stable, there is a lot 3rd party software that goes into making ingress-nginx work and we do our best to test all the components. If you are interested in helping us out please join us every other Thursday at 11 am eastern in our community meetings or on #ingress-nginx-dev on kubernetes.slack.com. |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ingress-nginx](https://togithub.com/kubernetes/ingress-nginx) | minor | `4.10.1` -> `4.11.1` | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>kubernetes/ingress-nginx (ingress-nginx)</summary> ### [`v4.11.1`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.1) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.11.0`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.0) ### WARNING There are known issues with this release, some folks are experiencing core dumps. Please see [https://github.com/kubernetes/ingress-nginx/issues/11588](https://togithub.com/kubernetes/ingress-nginx/issues/11588) for more information and comment if you are experiencing issues. Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.10.3`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.10.3) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.10.2`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.10.2) ### WARNING There are known issues with this release, some folks are experiencing core dumps. Please see [https://github.com/kubernetes/ingress-nginx/issues/11588](https://togithub.com/kubernetes/ingress-nginx/issues/11588) for more information and comments if you are experiencing issues. Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View the [repository job log](https://developer.mend.io/github/lambchop4prez/network). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjUuMSIsInVwZGF0ZWRJblZlciI6IjM3LjQzOC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ingress-nginx](https://togithub.com/kubernetes/ingress-nginx) | minor | `4.10.1` -> `4.11.1` | --- ### Release Notes <details> <summary>kubernetes/ingress-nginx (ingress-nginx)</summary> ### [`v4.11.1`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.1) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.11.0`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.0) ### WARNING There are known issues with this release, some folks are experiencing core dumps. Please see [https://github.com/kubernetes/ingress-nginx/issues/11588](https://togithub.com/kubernetes/ingress-nginx/issues/11588) for more information and comment if you are experiencing issues. Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.10.3`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.10.3) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.10.2`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.10.2) ### WARNING There are known issues with this release, some folks are experiencing core dumps. Please see [https://github.com/kubernetes/ingress-nginx/issues/11588](https://togithub.com/kubernetes/ingress-nginx/issues/11588) for more information and comments if you are experiencing issues. Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View the [repository job log](https://developer.mend.io/github/anza-labs/infra). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4yMC4xIiwidXBkYXRlZEluVmVyIjoiMzguMjAuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
btw there is patch what awaits feedback openresty/lua-nginx-module#2339 (comment) |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ingress-nginx](https://togithub.com/kubernetes/ingress-nginx) | minor | `4.10.1` -> `4.11.2` | --- ### Release Notes <details> <summary>kubernetes/ingress-nginx (ingress-nginx)</summary> ### [`v4.11.2`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.2) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.11.1`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.1) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.11.0`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.11.0) ### WARNING There are known issues with this release, some folks are experiencing core dumps. Please see [https://github.com/kubernetes/ingress-nginx/issues/11588](https://togithub.com/kubernetes/ingress-nginx/issues/11588) for more information and comment if you are experiencing issues. Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.10.3`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.10.3) Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer ### [`v4.10.2`](https://togithub.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.10.2) ### WARNING There are known issues with this release, some folks are experiencing core dumps. Please see [https://github.com/kubernetes/ingress-nginx/issues/11588](https://togithub.com/kubernetes/ingress-nginx/issues/11588) for more information and comments if you are experiencing issues. Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View the [repository job log](https://developer.mend.io/github/anza-labs/manifests). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4yNi4xIiwidXBkYXRlZEluVmVyIjoiMzguMjYuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
What happened:
I was updating from ingress-nginx
1.10.1
to1.11.0
usinghelm upgrade ingress-nginx /helm/ingress-nginx --install -f /helm/ingress-nginx/custom-values.yaml
All five ingresses assigned to that ingress-nginx instance went unresponsive, came up a few times but then crashed constantly.
When opening the ingress hosts in the browser, i see lots of messages such as:
A fallback to 1.10.1 fixed the issue.
What you expected to happen:
A normal (rolling) deployment with responsive hosts and no core dumps.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller
Release: v1.11.0
Build: 96dea88
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.5
Kubernetes version (use
kubectl version
):Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6
Environment:
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS / containerd://1.7.18
Kernel (e.g.
uname -a
): 5.15.0-107-genericBasic cluster related info:
kubectl version
: Already described, see above...kubectl get nodes -o wide
: Already described, see above...How was the ingress-nginx-controller installed:
helm ls -A | grep -i ingress
If helm was used then please show output of
helm -n <ingresscontrollernamespace> get values <helmreleasename>
values.json
if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances
Current State of the controller:
kubectl describe ingressclasses
ingressclasses.json
kubectl -n <ingresscontrollernamespace> get all -A -o wide
getallwide.log
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
pod-describe.log
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
svc-describe.log
kubectl -n <ingresscontrollernamespace> logs <ingresscontrollerservicename>
ingress-nginx-controller-5bd44bf869-4k9kf.log
How to reproduce this issue:
Anything else we need to know:
Attached logs:
values.json
ingressclasses.json
getallwide.log
pod-describe.log
svc-describe.log
ingress-nginx-controller-5bd44bf869-4k9kf.log
The text was updated successfully, but these errors were encountered: