Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to reload NGINX: failed to send the HUP signal to NGINX main: operation not permitted #1055

Closed
poneding opened this issue Sep 13, 2023 · 12 comments · Fixed by #1063
Closed
Assignees
Labels
bug Something isn't working
Milestone

Comments

@poneding
Copy link

poneding commented Sep 13, 2023

Describe the bug
I ran nginx-gateway and demo with the guide, but I couldn't get the correct results.
Get Error: failed to reload NGINX: failed to send the HUP signal to NGINX main: operation not permitted

I have no idea why this problem occurs, and I don't know how to solve it. Look for help.

Gateway.Status:

status:
  addresses:
    - type: IPAddress
      value: 10.244.0.63
  conditions:
    - lastTransitionTime: '2023-09-13T06:09:28Z'
      message: Gateway is accepted
      observedGeneration: 1
      reason: Accepted
      status: 'True'
      type: Accepted
    - lastTransitionTime: '2023-09-13T06:09:28Z'
      message: >-
        The Gateway is not programmed due to a failure to reload nginx with the
        configuration
      observedGeneration: 1
      reason: Invalid
      status: 'False'
      type: Programmed
  listeners:
    - attachedRoutes: 0
      conditions:
        - lastTransitionTime: '2023-09-13T06:09:28Z'
          message: Listener is accepted
          observedGeneration: 1
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2023-09-13T06:09:28Z'
          message: All references are resolved
          observedGeneration: 1
          reason: ResolvedRefs
          status: 'True'
          type: ResolvedRefs
        - lastTransitionTime: '2023-09-13T06:09:28Z'
          message: No conflicts
          observedGeneration: 1
          reason: NoConflicts
          status: 'False'
          type: Conflicted
        - lastTransitionTime: '2023-09-13T06:09:28Z'
          message: >-
            The Listener is not programmed due to a failure to reload nginx with
            the configuration
          observedGeneration: 1
          reason: Invalid
          status: 'False'
          type: Programmed
      name: http
      supportedKinds:
        - group: gateway.networking.k8s.io
          kind: HTTPRoute

nginx-gateway pod error logs:

{
  "level": "error",
  "ts": "2023-09-13T06:09:51Z",
  "logger": "eventHandler",
  "msg": "Failed to update NGINX configuration",
  "error": "failed to reload NGINX: failed to send the HUP signal to NGINX main: operation not permitted",
  "stacktrace": "github.com/nginxinc/nginx-kubernetes-gateway/internal/mode/static.(*eventHandlerImpl).HandleEventBatch
    /home/runner/work/nginx-kubernetes-gateway/nginx-kubernetes-gateway/internal/mode/static/handler.go:95
   github.com/nginxinc/nginx-kubernetes-gateway/internal/framework/events.(*EventLoop).Start.func1.1
    /home/runner/work/nginx-kubernetes-gateway/nginx-kubernetes-gateway/internal/framework/events/loop.go:68"
}

Info:

  • OS: linux (amd64)
  • OS Image: Ubuntu 22.04 LTS
  • Kernel version: 5.15.0-56-generic
  • Container runtime: docker:20.10.13
  • Kubelet version: v1.27.2
  • Gateway API: 0.8.0 standard-install
  • Nginx Gateway: 0.6
@sjberman
Copy link
Contributor

sjberman commented Sep 13, 2023

Hi @poneding, did you install version 0.6 using the latest released manifest or helm chart? https://github.com/nginxinc/nginx-kubernetes-gateway/tree/v0.6.0/deploy

@poneding
Copy link
Author

Hi @poneding, did you install version 0.6 using the latest released manifest or helm chart? https://github.com/nginxinc/nginx-kubernetes-gateway/tree/v0.6.0/deploy

Hi @sjberman , I install v0.6 and main branch following docs/installation.md with manifest,but got same error both.

@sjberman
Copy link
Contributor

Can you provide the deployment manifest that you used to install? Also, are you running this in a cloud-provided kubernetes environment or in a local kubernetes environment?

@poneding
Copy link
Author

Can you provide the deployment manifest that you used to install? Also, are you running this in a cloud-provided kubernetes environment or in a local kubernetes environment?

In a cloud virtual machine

apiVersion: v1
kind: Namespace
metadata:
  name: nginx-gateway
---
# Source: nginx-kubernetes-gateway/templates/rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-gateway
  namespace: nginx-gateway
  labels:
    app.kubernetes.io/name: nginx-gateway
    app.kubernetes.io/instance: nginx-gateway
    app.kubernetes.io/version: "0.6.0"
  annotations:
    {}
---
# Source: nginx-kubernetes-gateway/templates/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nginx-gateway
  labels:
    app.kubernetes.io/name: nginx-gateway
    app.kubernetes.io/instance: nginx-gateway
    app.kubernetes.io/version: "0.6.0"
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - services
  - secrets
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - list
  - watch
- apiGroups:
  - gateway.networking.k8s.io
  resources:
  - gatewayclasses
  - gateways
  - httproutes
  - referencegrants
  verbs:
  - list
  - watch
- apiGroups:
  - gateway.networking.k8s.io
  resources:
  - httproutes/status
  - gateways/status
  - gatewayclasses/status
  verbs:
  - update
- apiGroups:
  - gateway.nginx.org
  resources:
  - nginxgateways
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - gateway.nginx.org
  resources:
  - nginxgateways/status
  verbs:
  - update
---
# Source: nginx-kubernetes-gateway/templates/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: nginx-gateway
  labels:
    app.kubernetes.io/name: nginx-gateway
    app.kubernetes.io/instance: nginx-gateway
    app.kubernetes.io/version: "0.6.0"
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nginx-gateway
subjects:
- kind: ServiceAccount
  name: nginx-gateway
  namespace: nginx-gateway
---
# Source: nginx-kubernetes-gateway/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-gateway
  namespace: nginx-gateway
  labels:
    app.kubernetes.io/name: nginx-gateway
    app.kubernetes.io/instance: nginx-gateway
    app.kubernetes.io/version: "0.6.0"
spec:
  # We only support a single replica for now
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx-gateway
      app.kubernetes.io/instance: nginx-gateway
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx-gateway
        app.kubernetes.io/instance: nginx-gateway
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9113"
    spec:
      containers:
      - args:
        - static-mode
        - --gateway-ctlr-name=gateway.nginx.org/nginx-gateway-controller
        - --gatewayclass=nginx
        - --config=nginx-gateway-config
        - --metrics-port=9113
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: MY_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: ghcr.io/nginxinc/nginx-kubernetes-gateway:0.6.0
        imagePullPolicy: IfNotPresent
        name: nginx-gateway
        ports:
        - name: metrics
          containerPort: 9113
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - KILL
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsUser: 102
          runAsGroup: 1001
        volumeMounts:
        - name: nginx-conf
          mountPath: /etc/nginx/conf.d
        - name: nginx-secrets
          mountPath: /etc/nginx/secrets
        - name: nginx-run
          mountPath: /var/run/nginx
      - image: ghcr.io/nginxinc/nginx-kubernetes-gateway/nginx:0.6.0
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          name: http
        - containerPort: 443
          name: https
        securityContext:
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsUser: 101
          runAsGroup: 1001
        volumeMounts:
        - name: nginx-conf
          mountPath: /etc/nginx/conf.d
        - name: nginx-secrets
          mountPath: /etc/nginx/secrets
        - name: nginx-run
          mountPath: /var/run/nginx
        - name: nginx-cache
          mountPath: /var/cache/nginx
        - name: nginx-lib
          mountPath: /var/lib/nginx
      serviceAccountName: nginx-gateway
      shareProcessNamespace: true
      securityContext:
        fsGroup: 1001
        runAsNonRoot: true
      volumes:
      - name: nginx-conf
        emptyDir: {}
      - name: nginx-secrets
        emptyDir: {}
      - name: nginx-run
        emptyDir: {}
      - name: nginx-cache
        emptyDir: {}
      - name: nginx-lib
        emptyDir: {}
---
# Source: nginx-kubernetes-gateway/templates/gatewayclass.yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: GatewayClass
metadata:
  name: nginx
  labels:
    app.kubernetes.io/name: nginx-gateway
    app.kubernetes.io/instance: nginx-gateway
    app.kubernetes.io/version: "0.6.0"
spec:
  controllerName: gateway.nginx.org/nginx-gateway-controller
---
# Source: nginx-kubernetes-gateway/templates/nginxgateway.yaml
apiVersion: gateway.nginx.org/v1alpha1
kind: NginxGateway
metadata:
  name: nginx-gateway-config
  namespace: nginx-gateway
  labels:
    app.kubernetes.io/name: nginx-gateway
    app.kubernetes.io/instance: nginx-gateway
    app.kubernetes.io/version: "0.6.0"
spec:
  logging:
    level: info

@poneding
Copy link
Author

I install kubernetes by kubeadm

@sjberman
Copy link
Contributor

sjberman commented Sep 13, 2023

If you attempt to deploy v0.5.0 using that branch and installation guide, does it succeed? I would also be curious if the v0.6.0 deployment worked for you in a local kind cluster, for example. Just trying to narrow down if there's an environmental permissions issue or not.

@poneding
Copy link
Author

If you attempt to deploy v0.5.0 using that branch and installation guide, does it succeed?

Ok, let me try it and tell you result later🤩

@poneding
Copy link
Author

poneding commented Sep 14, 2023

Deploy release-0.5, this time i got the correct result:

root@cloud ~$ k describe gateways.gateway.networking.k8s.io gateway
Name:         gateway
Namespace:    default
Labels:       domain=k8s-gateway.nginx.org
Annotations:  <none>
API Version:  gateway.networking.k8s.io/v1beta1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2023-09-14T01:14:11Z
  Generation:          1
  Resource Version:    15010726
  UID:                 3d4e252e-4ede-4c9d-8181-bc8572be63e2
Spec:
  Gateway Class Name:  nginx
  Listeners:
    Allowed Routes:
      Namespaces:
        From:  Same
    Hostname:  *.example.com
    Name:      http
    Port:      80
    Protocol:  HTTP
Status:
  Addresses:
    Type:   IPAddress
    Value:  10.244.0.100
  Conditions:
    Last Transition Time:  2023-09-14T01:17:20Z
    Message:               Gateway is accepted
    Observed Generation:   1
    Reason:                Accepted
    Status:                True
    Type:                  Accepted
    Last Transition Time:  2023-09-14T01:17:20Z
    Message:               Gateway is programmed
    Observed Generation:   1
    Reason:                Programmed
    Status:                True
    Type:                  Programmed
  Listeners:
    Attached Routes:  2
    Conditions:
      Last Transition Time:  2023-09-14T01:17:20Z
      Message:               Listener is accepted
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
      Last Transition Time:  2023-09-14T01:17:20Z
      Message:               Listener is programmed
      Observed Generation:   1
      Reason:                Programmed
      Status:                True
      Type:                  Programmed
      Last Transition Time:  2023-09-14T01:17:20Z
      Message:               All references are resolved
      Observed Generation:   1
      Reason:                ResolvedRefs
      Status:                True
      Type:                  ResolvedRefs
      Last Transition Time:  2023-09-14T01:17:20Z
      Message:               No conflicts
      Observed Generation:   1
      Reason:                NoConflicts
      Status:                False
      Type:                  Conflicted
    Name:                    http
    Supported Kinds:
      Group:  gateway.networking.k8s.io
      Kind:   HTTPRoute
Events:       <none>
root@cloud ~$ curl --resolve cafe.example.com:$GW_PORT:$GW_IP http://cafe.example.com:$GW_PORT/coffee
Server address: 10.244.0.98:8080
Server name: coffee-7dd75bc79b-2fldc
Date: 14/Sep/2023:01:19:23 +0000
URI: /coffee
Request ID: 6afc2955400ccfc25574c9696a9fa792

And then, i retry deloy latest main branch, but as the same, error log and cant not access.

@sjberman
Copy link
Contributor

Ok, our security contexts in the manifest for the Pod/containers have changed since our last release. Using the v0.6 release, could you try messing around with those? For example, setting runAsNonRoot to false or setting allowPrivilegeEscalation to true?

@poneding
Copy link
Author

Ok, our security contexts in the manifest for the Pod/containers have changed since our last release. Using the v0.6 release, could you try messing around with those? For example, setting runAsNonRoot to false or setting allowPrivilegeEscalation to true?

The fact is that I get the correct result when I set allowPrivilegeEscalation to true. runAsNonRoot does not affect to the result. Will allowPrivilegeEscalation be set to true by default in future releases?

@mpstefan mpstefan added this to the v1.0.0 milestone Sep 14, 2023
@mpstefan mpstefan added the bug Something isn't working label Sep 14, 2023
@sjberman
Copy link
Contributor

Here is our plan for addressing this issue:

  • attempt to reproduce the permissions issue on other k8s environments. We currently do not see this issue on kind clusters or GKE
  • unless this is seen as a common issue in other environments, we'll likely keep this as the default value since it is the more secure option
  • allow for the value to be configurable in our helm chart
  • document for users of the direct manifest (non-helm users) to update this value if they see permissions issues

@sjberman sjberman self-assigned this Sep 14, 2023
@sjberman sjberman moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Sep 14, 2023
@sjberman
Copy link
Contributor

@poneding In initial testing, we aren't seeing the permissions issues in other k8s environments, so I'm curious if there is a setting that you have configured in your kubeadm deployment that would require privilege escalation for system calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants