Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods restart loop with error "[emerg] 23#23: bind() to 0.0.0.0:80 failed (13: Permission denied)" in latest chart/version for daemonset #3932

Closed
brian-provenzano opened this issue May 22, 2023 · 15 comments

Comments

@brian-provenzano
Copy link

brian-provenzano commented May 22, 2023

Describe the bug
Using latest image and helm chart and upgrading from v2.4.2 I am getting permission denied errors in nginx pods which causes constant restarts. It appears the issue revolves around these recent securityContext changes PR 3722 and PR 3573.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy v3.1.1 (Chart 0.17.1) in daemonset configuration by using helm template... then kubectl apply. - see sample values.yaml to see our settings.
  2. Pods will not successfully start - continuously restart
  3. View logs on a restarting pod and will see 2023/05/22 17:08:45 [emerg] 23#23: bind() to 0.0.0.0:80 failed (13: Permission denied)
  4. If change daemonset.spec.template.spec.containers.securityContext.allowPrivilegeEscalation to true (current setting is false in chart template) and restart the ds it works fine and pods start. This appears to be the same setting that was present in v.2.4.2 which we currently run without issue.

Expected behavior
I expect the pods to start successfully even with the new securityContext in place.

Your environment

  • Version of the Ingress Controller - v3.1.1 with Chart 0.17.1
  • Version of Kubernetes - 1.23
  • Kubernetes platform (e.g. Mini-kube or GCP) - EKS
  • Using NGINX or NGINX Plus : NGINX

Additional context
I can provide more information if needed. I would adjust the daemonset.spec.template.spec.containers.securityContext.allowPrivilegeEscalation to false to fix this ourselves (albeit reverting to a previously less secure setup that was present in v.2.4.2), but that param is not configurable in the chart.

v3.1.1 Images tried:nginx/nginx-ingress:3.1.1-ubi and public.ecr.aws/nginx/nginx-ingress:3.1.1-ubi (but we use the aws ecr public image due to dockerhub throttles)

test-values.yaml.txt

@github-actions
Copy link

Hi @brian-provenzano thanks for reporting!

Be sure to check out the docs and the Contributing Guidelines while you wait for a human to take a look at this 🙂

Cheers!

@brian-provenzano brian-provenzano changed the title Pods restart loop with error 2023/05/22 17:08:45 [emerg] 23#23: bind() to 0.0.0.0:80 failed (13: Permission denied) in latest chart/version for daemonset Pods restart loop with error "[emerg] 23#23: bind() to 0.0.0.0:80 failed (13: Permission denied)" in latest chart/version for daemonset May 22, 2023
@vepatel
Copy link
Contributor

vepatel commented May 23, 2023

Hi @brian-provenzano, tested this on Nginx Ingress controller v3.1.1 on k8s 1.27:

/nginx/kubernetes-ingress/deployments/helm-chart|72473392⚡ ⇒  k logs test-release-nginx-ingress-controller-4vkdg | grep Version=
NGINX Ingress Controller Version=3.1.1 Commit=72473392d14cb0971de4b916a8db9bb675a16634 Date=2023-05-04T23:50:20Z DirtyState=false Arch=linux/amd64 Go=go1.20.3

/nginx/kubernetes-ingress/deployments/helm-chart|72473392⚡ ⇒  k get pods
NAME                                          READY   STATUS    RESTARTS   AGE
test-release-nginx-ingress-controller-4vkdg   1/1     Running   0          5m21s
test-release-nginx-ingress-controller-9ckjh   1/1     Running   0          5m21s
test-release-nginx-ingress-controller-lt6mj   1/1     Running   0          5m21s

/nginx/kubernetes-ingress/deployments/helm-chart|72473392⚡ ⇒  k get pods test-release-nginx-ingress-controller-4vkdg -o yaml | grep allowPrivilegeEscalation
      allowPrivilegeEscalation: false

can you please make sure if you're on correct release tag while running helm install.. or kubectl apply..

helm cmd used: helm install test-release --set controller.kind=daemonset --set controller.nginxplus=false --set controller.image.repository=nginx/nginx-ingress --set controller.image.tag="3.1.1" --set controller.image.pullPolicy=Always .

@brianehlert
Copy link
Collaborator

Specifically, there is tuning to the netbind service in the patch https://docs.nginx.com/nginx-ingress-controller/releases/#nginx-ingress-controller-311
Thus the Helm chart / manifests must match the container version.

@brian-provenzano
Copy link
Author

We run helm template...kubectl apply (actually it is run thru spinnaker). I used 3.1.1-ubi from dockerhub and the same image public ecr. I corrected the version in the original post.

I will double check my work though to be sure and get back to you asap...

@brian-provenzano
Copy link
Author

brian-provenzano commented May 23, 2023

OK - I tried with these images nginx/nginx-ingress:3.1.1 and nginx/nginx-ingress:3.1.1-ubi. I have attached a copy of the ds I tried that is using thenginx/nginx-ingress:3.1.1 image which still does not work for us (pods throw the perm error previously described).

Testing process: I edited the ds on the cluster to use the nginx/nginx-ingress:3.1.1 image (which launched new pods), but still get the perm error in pod logs and pods constantly restarting. If I change allowPrivilegeEscalation to true all is fine.

Could this be some issue in how our nodes are configured? AMI, OS etc? We are using custom Ubuntu CIS AMIs and not the official AWS EKS optimized AMIs.

Logs from a pod that successfully starts/runs once I change to allowPrivilegeEscalation: true:

NGINX Ingress Controller Version=3.1.1 Commit=72473392d14cb0971de4b916a8db9bb675a16634 Date=2023-05-04T23:50:20Z DirtyState=false Arch=linux/amd64 Go=go1.20.3
I0523 16:51:05.622911       1 flags.go:294] Starting with flags: ["-nginx-plus=false" "-nginx-reload-timeout=60000" "-enable-app-protect=false" "-enable-app-protect-dos=false" "-nginx-configmaps=nginx-ingress/nginx-config" "-default-server-tls-secret=nginx-ingress/nginx-ingress-secret" "-ingress-class=nginx" "-health-status=false" "-health-status-uri=/nginx-health" "-nginx-debug=false" "-v=1" "-nginx-status=false" "-report-ingress-status" "-external-service=nginx-ingress-external" "-enable-leader-election=true" "-leader-election-lock-name=kdp-core-nginx-ingress-leader-election" "-enable-prometheus-metrics=false" "-prometheus-metrics-listen-port=9113" "-prometheus-tls-secret=" "-enable-service-insight=false" "-service-insight-listen-port=9114" "-service-insight-tls-secret=" "-enable-custom-resources=true" "-enable-snippets=true" "-include-year=false" "-disable-ipv6=false" "-enable-tls-passthrough=false" "-enable-preview-policies=false" "-enable-cert-manager=false" "-enable-oidc=false" "-enable-external-dns=false" "-ready-status=true" "-ready-status-port=8081" "-enable-latency-metrics=false"]
I0523 16:51:05.629088       1 main.go:234] Kubernetes version: 1.23.17
I0523 16:51:05.635203       1 main.go:380] Using nginx version: nginx/1.23.4
I0523 16:51:05.739233       1 main.go:776] Pod label updated: nginx-ingress-q2bvf
2023/05/23 16:51:05 [notice] 18#18: using the "epoll" event method
2023/05/23 16:51:05 [notice] 18#18: nginx/1.23.4
2023/05/23 16:51:05 [notice] 18#18: built by gcc 11.2.1 20220127 (Red Hat 11.2.1-9) (GCC)
2023/05/23 16:51:05 [notice] 18#18: OS: Linux 5.4.0-1100-aws
2023/05/23 16:51:05 [notice] 18#18: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023/05/23 16:51:05 [notice] 18#18: start worker processes
2023/05/23 16:51:05 [notice] 18#18: start worker process 22
2023/05/23 16:51:05 [notice] 18#18: start worker process 23
2023/05/23 16:51:05 [notice] 18#18: start worker process 24
2023/05/23 16:51:05 [notice] 18#18: start worker process 25
2023/05/23 16:51:05 [notice] 18#18: start worker process 26
2023/05/23 16:51:05 [notice] 18#18: start worker process 27
2023/05/23 16:51:05 [notice] 18#18: start worker process 28
2023/05/23 16:51:05 [notice] 18#18: start worker process 29
2023/05/23 16:51:05 [notice] 18#18: start worker process 30
2023/05/23 16:51:05 [notice] 18#18: start worker process 31
2023/05/23 16:51:05 [notice] 18#18: start worker process 32
2023/05/23 16:51:05 [notice] 18#18: start worker process 33
2023/05/23 16:51:05 [notice] 18#18: start worker process 34
2023/05/23 16:51:05 [notice] 18#18: start worker process 35
2023/05/23 16:51:05 [notice] 18#18: start worker process 36
2023/05/23 16:51:05 [notice] 18#18: start worker process 37
...

Logs from a pod when allowPrivilegeEscalation: false (pod does not start/restarts constantly):

NGINX Ingress Controller Version=3.1.1 Commit=72473392d14cb0971de4b916a8db9bb675a16634 Date=2023-05-04T23:50:20Z DirtyState=false Arch=linux/amd64 Go=go1.20.3
I0523 16:49:08.587514       1 flags.go:294] Starting with flags: ["-nginx-plus=false" "-nginx-reload-timeout=60000" "-enable-app-protect=false" "-enable-app-protect-dos=false" "-nginx-configmaps=nginx-ingress/nginx-config" "-default-server-tls-secret=nginx-ingress/nginx-ingress-secret" "-ingress-class=nginx" "-health-status=false" "-health-status-uri=/nginx-health" "-nginx-debug=false" "-v=1" "-nginx-status=false" "-report-ingress-status" "-external-service=nginx-ingress-external" "-enable-leader-election=true" "-leader-election-lock-name=kdp-core-nginx-ingress-leader-election" "-enable-prometheus-metrics=false" "-prometheus-metrics-listen-port=9113" "-prometheus-tls-secret=" "-enable-service-insight=false" "-service-insight-listen-port=9114" "-service-insight-tls-secret=" "-enable-custom-resources=true" "-enable-snippets=true" "-include-year=false" "-disable-ipv6=false" "-enable-tls-passthrough=false" "-enable-preview-policies=false" "-enable-cert-manager=false" "-enable-oidc=false" "-enable-external-dns=false" "-ready-status=true" "-ready-status-port=8081" "-enable-latency-metrics=false"]
I0523 16:49:08.593176       1 main.go:234] Kubernetes version: 1.23.17
I0523 16:49:08.601693       1 main.go:380] Using nginx version: nginx/1.23.4
I0523 16:49:08.635197       1 main.go:776] Pod label updated: nginx-ingress-dbwbh
2023/05/23 16:49:08 [emerg] 24#24: bind() to 0.0.0.0:80 failed (13: Permission denied)

nginx-ingress-ds.yaml.txt

@brianehlert
Copy link
Collaborator

We have had issues with Helm upgrades in the past where changes to rbac.yaml (or in openshift scc.yaml) is not processed properly due to how Helm performs the upgrade.

I see that you are using a daemonset instead of a deployment..
Do you get a different result if you use a deployment? I am curious.

@brian-provenzano
Copy link
Author

OK - I will give that a try and report back - shouldn't take long to test

@brian-provenzano
Copy link
Author

Same issue - no change in behavior as deployment. Attached is a copy of the deployment.

pod logs when deployed as a deployment (same as before):

NGINX Ingress Controller Version=3.1.1 Commit=72473392d14cb0971de4b916a8db9bb675a16634 Date=2023-05-04T23:50:20Z DirtyState=false Arch=linux/amd64 Go=go1.20.3
I0523 20:47:37.302872       1 flags.go:294] Starting with flags: ["-nginx-plus=false" "-nginx-reload-timeout=60000" "-enable-app-protect=false" "-enable-app-protect-dos=false" "-nginx-configmaps=nginx-ingress/nginx-config" "-default-server-tls-secret=nginx-ingress/nginx-ingress-secret" "-ingress-class=nginx" "-health-status=false" "-health-status-uri=/nginx-health" "-nginx-debug=false" "-v=1" "-nginx-status=false" "-report-ingress-status" "-external-service=nginx-ingress-external" "-enable-leader-election=true" "-leader-election-lock-name=kdp-core-nginx-ingress-leader-election" "-enable-prometheus-metrics=false" "-prometheus-metrics-listen-port=9113" "-prometheus-tls-secret=" "-enable-service-insight=false" "-service-insight-listen-port=9114" "-service-insight-tls-secret=" "-enable-custom-resources=true" "-enable-snippets=true" "-include-year=false" "-disable-ipv6=false" "-enable-tls-passthrough=false" "-enable-preview-policies=false" "-enable-cert-manager=false" "-enable-oidc=false" "-enable-external-dns=false" "-ready-status=true" "-ready-status-port=8081" "-enable-latency-metrics=false"]
I0523 20:47:37.393542       1 main.go:234] Kubernetes version: 1.23.17
I0523 20:47:37.400536       1 main.go:380] Using nginx version: nginx/1.23.4
I0523 20:47:37.432189       1 main.go:776] Pod label updated: nginx-ingress-77d64565d8-mttlk
2023/05/23 20:47:37 [emerg] 24#24: bind() to 0.0.0.0:80 failed (13: Permission denied)

Again if I change to allowPrivilegeEscalation: true it works fine.

NGINX Ingress Controller Version=3.1.1 Commit=72473392d14cb0971de4b916a8db9bb675a16634 Date=2023-05-04T23:50:20Z DirtyState=false Arch=linux/amd64 Go=go1.20.3
I0523 20:55:42.888299       1 flags.go:294] Starting with flags: ["-nginx-plus=false" "-nginx-reload-timeout=60000" "-enable-app-protect=false" "-enable-app-protect-dos=false" "-nginx-configmaps=nginx-ingress/nginx-config" "-default-server-tls-secret=nginx-ingress/nginx-ingress-secret" "-ingress-class=nginx" "-health-status=false" "-health-status-uri=/nginx-health" "-nginx-debug=false" "-v=1" "-nginx-status=false" "-report-ingress-status" "-external-service=nginx-ingress-external" "-enable-leader-election=true" "-leader-election-lock-name=kdp-core-nginx-ingress-leader-election" "-enable-prometheus-metrics=false" "-prometheus-metrics-listen-port=9113" "-prometheus-tls-secret=" "-enable-service-insight=false" "-service-insight-listen-port=9114" "-service-insight-tls-secret=" "-enable-custom-resources=true" "-enable-snippets=true" "-include-year=false" "-disable-ipv6=false" "-enable-tls-passthrough=false" "-enable-preview-policies=false" "-enable-cert-manager=false" "-enable-oidc=false" "-enable-external-dns=false" "-ready-status=true" "-ready-status-port=8081" "-enable-latency-metrics=false"]
I0523 20:55:42.895868       1 main.go:234] Kubernetes version: 1.23.17
I0523 20:55:42.907961       1 main.go:380] Using nginx version: nginx/1.23.4
I0523 20:55:42.935903       1 main.go:776] Pod label updated: nginx-ingress-86bfb79447-4pnh6
2023/05/23 20:55:42 [notice] 25#25: using the "epoll" event method
2023/05/23 20:55:42 [notice] 25#25: nginx/1.23.4
2023/05/23 20:55:42 [notice] 25#25: built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
2023/05/23 20:55:42 [notice] 25#25: OS: Linux 5.4.0-1100-aws
2023/05/23 20:55:42 [notice] 25#25: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023/05/23 20:55:42 [notice] 25#25: start worker processes
2023/05/23 20:55:42 [notice] 25#25: start worker process 26
2023/05/23 20:55:42 [notice] 25#25: start worker process 27
2023/05/23 20:55:42 [notice] 25#25: start worker process 28
2023/05/23 20:55:42 [notice] 25#25: start worker process 29
2023/05/23 20:55:42 [notice] 25#25: start worker process 30
2023/05/23 20:55:42 [notice] 25#25: start worker process 31
2023/05/23 20:55:42 [notice] 25#25: start worker process 32
2023/05/23 20:55:42 [notice] 25#25: start worker process 33
2023/05/23 20:55:42 [notice] 25#25: start worker process 34
2023/05/23 20:55:42 [notice] 25#25: start worker process 35
2023/05/23 20:55:42 [notice] 25#25: start worker process 36
2023/05/23 20:55:42 [notice] 25#25: start worker process 37
2023/05/23 20:55:42 [notice] 25#25: start worker process 38
2023/05/23 20:55:42 [notice] 25#25: start worker process 39
2023/05/23 20:55:42 [notice] 25#25: start worker process 40
2023/05/23 20:55:42 [notice] 25#25: start worker process 41

nginx-ingress-deployment.yaml.txt

@vepatel
Copy link
Contributor

vepatel commented May 24, 2023

weird, working for me with default values on both GKE and AKS with helm chart=0.17.1
GKE: #3932 (comment)
AKS: In this scenario I performed an upgrade from 2.4.2 to 3.1.1

/nginx/kubernetes-ingress/deployments/helm-chart|72473392⚡ ⇒  k get pods
NAME                                          READY   STATUS    RESTARTS   AGE
test-release-nginx-ingress-controller-2bn59   1/1     Running   0          12s
test-release-nginx-ingress-controller-5kj6z   1/1     Running   0          12s
test-release-nginx-ingress-controller-w596l   1/1     Running   0          12s

/nginx/kubernetes-ingress/deployments/helm-chart|72473392⚡ ⇒  k describe daemonsets.apps test-release-nginx-ingress-controller 
Name:           test-release-nginx-ingress-controller
Selector:       app.kubernetes.io/instance=test-release,app.kubernetes.io/name=nginx-ingress
Node-Selector:  <none>
Labels:         app.kubernetes.io/instance=test-release
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=nginx-ingress
                app.kubernetes.io/version=3.1.1
                helm.sh/chart=nginx-ingress-0.17.1
Annotations:    deprecated.daemonset.template.generation: 1
                meta.helm.sh/release-name: test-release
                meta.helm.sh/release-namespace: default

/nginx/kubernetes-ingress/deployments/helm-chart|72473392⚡ ⇒  k get pods test-release-nginx-ingress-controller-jrlm6 -o yaml | grep allowPrivilegeEscalation 
      allowPrivilegeEscalation: false                

I'll try EKS with official EKS optimized Amazon Linux 2 instances later.

@brian-provenzano
Copy link
Author

brian-provenzano commented May 24, 2023

Alright, I am starting to think it is something unique to our environment.

I did the following:

  • spun up a new EKS cluster with eksctl running k8s v1.23 with Amazon EKS AMIS. This is the same version of k8s we are using.
  • ran helm install test-release oci://ghcr.io/nginxinc/charts/nginx-ingress --version 0.17.1 --values values-test-nginx.yaml --create-namespace --namespace nginx-ingress on the new test cluster. Attached is the exact values I used
  • checked status and all pods run fine no errors

One other possible variable here is our container runtime is still docker in 1.23, besides the fact we are not using official AWS EKS AMIs. I think the current EKS AMIs built for 1.23 use containerd...?

Anyway, I am going to try another test on another one of our 1.23 clusters created using our IaC (TF not eksctl; our custom ubuntu AMI with docker runtime) to further test, but it appears to be an issue on my end. Sorry about the wild goose chase here :(

I am guessing we can close this for now and I can report back if anything changes...

values-test-nginx.yaml.txt

@brianehlert
Copy link
Collaborator

It is fine to leave this until you resolve. I think we all learn from these kinds of things.

@vepatel
Copy link
Contributor

vepatel commented May 25, 2023

Thanks @brian-provenzano for checking, I'll close this for now 👍🏼

@vepatel vepatel closed this as completed May 25, 2023
@justbert
Copy link

We're running into the same error on 3.3.2. We're building our own image to include some extra modules/capabilities and when our image is built with Docker this issue does not happen, however, when it's built with Kaniko, it does.

@vepatel
Copy link
Contributor

vepatel commented Feb 20, 2024

@justbert we'll be adding option to modify securityContext in 3.5.0 via helm so that should solve your issue hopefully

@justbert
Copy link

Found the issue! (I should have updated my comment) It seems Kaniko doesn't copy over extended file attributes whereas Docker does which means the NET_CAP_BIND was missing from the binary. It's not a well defined part of the COPY command which (as we can see) causes issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants