Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EndpointSlices causes crash (panic) with headless service #9606

Closed
iMartyn opened this issue Feb 11, 2023 · 29 comments
Closed

EndpointSlices causes crash (panic) with headless service #9606

iMartyn opened this issue Feb 11, 2023 · 29 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@iMartyn
Copy link

iMartyn commented Feb 11, 2023

What happened:

I want to create some namespace separation for a service, so I have a service in one namespace and a headless service with EndpointSlices in another.

"Actual" Service in cluster-ingress namespace :

apiVersion: v1
kind: Service
metadata:
  labels:
    app: ssh
  name: remote-jelly
  namespace: cluster-ingress
spec:
  ports:
  - port: 8096
    protocol: TCP
    targetPort: 8096
  selector:
    app: ssh
  type: ClusterIP

Service in jellyfin namespace (the one the ingress points to) :

apiVersion: v1
kind: Service
metadata:
  name: jellyfin
  namespace: jellyfin
spec:
  ports:
  - port: 8096

ingress in jellyfin namespace :

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
    kubernetes.io/ingress.class: nginx
  name: jellyfin-ingress
  namespace: jellyfin
spec:
  rules:
  - host: #redacted actual hostname
    http:
      paths:
      - backend:
          service:
            name: jellyfin
            port:
              number: 8096
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - #redacted actual hostname
    secretName: cert-tls

logs of pod:

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.5.1
  Build:         d003aae913cc25f375deb74f898c7f3c65c06f05
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

W0211 15:12:48.862945       7 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0211 15:12:48.863208       7 main.go:209] "Creating API client" host="https://10.32.0.1:443"
I0211 15:12:48.902012       7 main.go:253] "Running in Kubernetes cluster" major="1" minor="23" git="v1.23.6" state="clean" commit="ad3338546da947756e8a88aa6822e9c11e7eac22" platform="linux/amd64"
I0211 15:12:49.329221       7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0211 15:12:49.411054       7 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0211 15:12:49.478145       7 nginx.go:260] "Starting NGINX Ingress controller"
I0211 15:12:49.507640       7 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"cluster-ingress", Name:"cluster-ingress-ingress-nginx-controller", UID:"083aab75-7262-41f1-8d60-51b893ba1794", APIVersion:"v1", ResourceVersion:"30084222357", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap cluster-ingress/cluster-ingress-ingress-nginx-controller
I0211 15:12:50.594728       7 store.go:430] "Found valid IngressClass" ingress="uptime/uptime-kuma" ingressclass="nginx"
I0211 15:12:50.595747       7 backend_ssl.go:67] "Adding secret to local store" name="uptime/uptime-tls"
I0211 15:12:50.595805       7 store.go:430] "Found valid IngressClass" ingress="link/links-littlelink-server" ingressclass="nginx"
I0211 15:12:50.597333       7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"uptime", Name:"uptime-kuma", UID:"2734dc5a-2556-45c8-9e04-0ff96ace7cfd", APIVersion:"networking.k8s.io/v1", ResourceVersion:"30084222777", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0211 15:12:50.597366       7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"link", Name:"links-littlelink-server", UID:"7ef2eeae-83ae-4131-ac02-7004a40c3912", APIVersion:"networking.k8s.io/v1", ResourceVersion:"30084222779", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0211 15:12:50.598666       7 backend_ssl.go:67] "Adding secret to local store" name="link/links-tls"
I0211 15:12:50.598751       7 store.go:430] "Found valid IngressClass" ingress="jellyfin/jellyfin-ingress" ingressclass="nginx"
I0211 15:12:50.599623       7 backend_ssl.go:67] "Adding secret to local store" name="jellyfin/flix-tls"
I0211 15:12:50.599868       7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"jellyfin", Name:"jellyfin-ingress", UID:"3998cade-b793-4cef-8544-c8ea1bef3530", APIVersion:"networking.k8s.io/v1", ResourceVersion:"30084222780", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0211 15:12:50.679646       7 nginx.go:303] "Starting NGINX process"
I0211 15:12:50.680177       7 leaderelection.go:248] attempting to acquire leader lease cluster-ingress/cluster-ingress-ingress-nginx-leader...
I0211 15:12:50.680812       7 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
E0211 15:12:50.681956       7 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 141 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1809060?, 0x293cf10})
	k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x173a860?})
	k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1809060, 0x293cf10})
	runtime/panic.go:884 +0x212
k8s.io/ingress-nginx/internal/ingress/controller.getEndpointsFromSlices(0xc0004b9da8, 0xc000489648, {0x1a4bc03, 0x3}, 0xc000489630)
	k8s.io/ingress-nginx/internal/ingress/controller/endpointslices.go:115 +0xede
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).serviceEndpoints(0xc0001b03c0, {0xc0004aac00, 0x11}, {0xc00072f800, 0x4})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:1110 +0x5ec
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).createUpstreams(0xc0001b03c0, {0xc0003dace0, 0x3, 0x4?}, 0xc0000ef860)
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:1016 +0x19e5
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getBackendServers(0xc0001b03c0, {0xc0003dace0?, 0x3, 0x4})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:608 +0x8e
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getConfiguration(0xc0001b03c0, {0xc0003dace0, 0x3, 0x0?})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:513 +0x4e
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).syncIngress(0xc0001b03c0, {0x174d640, 0x1})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:155 +0x9a
k8s.io/ingress-nginx/internal/task.(*Queue).worker(0xc0004c53e0)
	k8s.io/ingress-nginx/internal/task/queue.go:129 +0x446
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x1cce3a0, 0xc000330750}, 0x1, 0xc000494480)
	k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	k8s.io/[email protected]/pkg/util/wait/wait.go:92
k8s.io/ingress-nginx/internal/task.(*Queue).Run(0x0?, 0x0?, 0x0?)
	k8s.io/ingress-nginx/internal/task/queue.go:61 +0x45
created by k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).Start
	k8s.io/ingress-nginx/internal/ingress/controller/nginx.go:306 +0x3bc
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15a627e]

goroutine 141 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x173a860?})
	k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x1809060, 0x293cf10})
	runtime/panic.go:884 +0x212
k8s.io/ingress-nginx/internal/ingress/controller.getEndpointsFromSlices(0xc0004b9da8, 0xc000489648, {0x1a4bc03, 0x3}, 0xc000489630)
	k8s.io/ingress-nginx/internal/ingress/controller/endpointslices.go:115 +0xede
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).serviceEndpoints(0xc0001b03c0, {0xc0004aac00, 0x11}, {0xc00072f800, 0x4})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:1110 +0x5ec
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).createUpstreams(0xc0001b03c0, {0xc0003dace0, 0x3, 0x4?}, 0xc0000ef860)
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:1016 +0x19e5
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getBackendServers(0xc0001b03c0, {0xc0003dace0?, 0x3, 0x4})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:608 +0x8e
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getConfiguration(0xc0001b03c0, {0xc0003dace0, 0x3, 0x0?})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:513 +0x4e
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).syncIngress(0xc0001b03c0, {0x174d640, 0x1})
	k8s.io/ingress-nginx/internal/ingress/controller/controller.go:155 +0x9a
k8s.io/ingress-nginx/internal/task.(*Queue).worker(0xc0004c53e0)
	k8s.io/ingress-nginx/internal/task/queue.go:129 +0x446
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x1cce3a0, 0xc000330750}, 0x1, 0xc000494480)
	k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	k8s.io/[email protected]/pkg/util/wait/wait.go:92
k8s.io/ingress-nginx/internal/task.(*Queue).Run(0x0?, 0x0?, 0x0?)
	k8s.io/ingress-nginx/internal/task/queue.go:61 +0x45
created by k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).Start
	k8s.io/ingress-nginx/internal/ingress/controller/nginx.go:306 +0x3bc

What you expected to happen:

I expect Ingress controller to work with headless services.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
(from logs because controller is no longer running (crashloop)

NGINX Ingress controller
  Release:       v1.5.1
  Build:         d003aae913cc25f375deb74f898c7f3c65c06f05
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: scaleway
  • OS (e.g. from /etc/os-release): /shrug - whatever "kapsule" uses
  • Kernel (e.g. uname -a): Linux ssh-54c8746bdc-6r84w 5.4.0-96-generic Proposal: Build a web interface for Ingress administration #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 Linux
  • Install tools:
    • "Kapsule"
  • Basic cluster related info:
    • stated above.
    • kubectl get nodes -o wide
NAME                                             STATUS   ROLES    AGE    VERSION   INTERNAL-IP     EXTERNAL-IP      OS-IMAGE                        KERNEL-VERSION     CONTAINER-RUNTIME
scw-drache-default-a98fe14dddfe457e8c0d5a2fb49   Ready    <none>   213d   v1.23.6   10.197.120.53   <<redacted>>   Ubuntu 20.04.3 LTS 7f923d9929   5.4.0-96-generic   containerd://1.5.10
  • How was the ingress-nginx-controller installed:
    ArgoCD helm chart. App manifest:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cluster-ingress
  namespace: argocd
operation:
  initiatedBy:
    username: admin
  sync:
    revision: 4.4.2
spec:
  destination:
    namespace: cluster-ingress
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: ingress-nginx
    helm:
      parameters:
      - name: controller.kind
        value: DaemonSet
      - name: controller.service.type
        value: NodePort
      - name: controller.hostNetwork
        value: "true"
      values: "controller:\n  dnsPolicy: ClusterFirst\n  service:\n    externalIPs:
        \n    - \"<<redacted>>\""
    repoURL: https://kubernetes.github.io/ingress-nginx
    targetRevision: 4.4.2
  • Current State of the controller:
    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=cluster-ingress
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.5.1
              argocd.argoproj.io/instance=cluster-ingress
              helm.sh/chart=ingress-nginx-4.4.2
Annotations:  <none>
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> get all -A -o wide
NAME                                                       READY   STATUS             RESTARTS      AGE    IP              NODE                                             NOMINATED NODE   READINESS GATES
pod/cluster-ingress-ingress-nginx-admission-create-whz7j   0/1     Completed          0             23m    100.64.0.77     scw-drache-default-a98fe14dddfe457e8c0d5a2fb49   <none>           <none>
pod/cluster-ingress-ingress-nginx-controller-5wqp5         0/1     CrashLoopBackOff   9 (20s ago)   21m    10.197.120.53   scw-drache-default-a98fe14dddfe457e8c0d5a2fb49   <none>           <none>

NAME                                                         TYPE        CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE    SELECTOR
service/cluster-ingress-ingress-nginx-controller             NodePort    10.35.34.238    <redacted>   80:31196/TCP,443:31838/TCP   212d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=cluster-ingress,app.kubernetes.io/name=ingress-nginx
service/cluster-ingress-ingress-nginx-controller-admission   ClusterIP   10.47.205.173   <none>           443/TCP                      212d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=cluster-ingress,app.kubernetes.io/name=ingress-nginx
service/remote-jelly                                         ClusterIP   10.32.58.244    <none>           8096/TCP                     124m   app=ssh

NAME                                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS   IMAGES                                                                                                                    SELECTOR
daemonset.apps/cluster-ingress-ingress-nginx-controller   1         1         0       1            0           kubernetes.io/os=linux   52m   controller   registry.k8s.io/ingress-nginx/controller:v1.5.1@sha256:4ba73c697770664c1e00e9f968de14e08f606ff961c76e5d7033a4a9c593c629   app.kubernetes.io/component=controller,app.kubernetes.io/instance=cluster-ingress,app.kubernetes.io/name=ingress-nginx

NAME                                                       COMPLETIONS   DURATION   AGE   CONTAINERS   IMAGES                                                                                                                                            SELECTOR
job.batch/cluster-ingress-ingress-nginx-admission-create   1/1           9s         23m   create       registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20220916-gd32f8c343@sha256:39c5b2e3310dc4264d638ad28d9d1d96c4cbb2b2dcfb52368fe4e3c63f61e10f   controller-uid=09ef771f-7a25-43ed-8867-fe01fcff5f57
  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name:         cluster-ingress-ingress-nginx-controller-5wqp5
Namespace:    cluster-ingress
Priority:     0
Node:         scw-drache-default-a98fe14dddfe457e8c0d5a2fb49/10.197.120.53
Start Time:   Sat, 11 Feb 2023 16:01:36 +0100
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=cluster-ingress
              app.kubernetes.io/name=ingress-nginx
              controller-revision-hash=995d77bfc
              pod-template-generation=2
Annotations:  <none>
Status:       Running
IP:           10.197.120.53
IPs:
  IP:           10.197.120.53
Controlled By:  DaemonSet/cluster-ingress-ingress-nginx-controller
Containers:
  controller:
    Container ID:  containerd://06987f336f5190d6c8b7cc86c0459ca623b3ea3b2dce0c20938a2c97d71c6622
    Image:         registry.k8s.io/ingress-nginx/controller:v1.5.1@sha256:4ba73c697770664c1e00e9f968de14e08f606ff961c76e5d7033a4a9c593c629
    Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:4ba73c697770664c1e00e9f968de14e08f606ff961c76e5d7033a4a9c593c629
    Ports:         80/TCP, 443/TCP, 8443/TCP
    Host Ports:    80/TCP, 443/TCP, 8443/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/cluster-ingress-ingress-nginx-controller
      --election-id=cluster-ingress-ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/cluster-ingress-ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Sat, 11 Feb 2023 16:23:03 +0100
      Finished:     Sat, 11 Feb 2023 16:23:05 +0100
    Ready:          False
    Restart Count:  9
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       cluster-ingress-ingress-nginx-controller-5wqp5 (v1:metadata.name)
      POD_NAMESPACE:  cluster-ingress (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zm964 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-ingress-ingress-nginx-admission
    Optional:    false
  kube-api-access-zm964:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  24m                    default-scheduler  Successfully assigned cluster-ingress/cluster-ingress-ingress-nginx-controller-5wqp5 to scw-drache-default-a98fe14dddfe457e8c0d5a2fb49
  Normal   Started    23m (x4 over 23m)      kubelet            Started container controller
  Normal   Pulled     22m (x5 over 23m)      kubelet            Container image "registry.k8s.io/ingress-nginx/controller:v1.5.1@sha256:4ba73c697770664c1e00e9f968de14e08f606ff961c76e5d7033a4a9c593c629" already present on machine
  Normal   Created    22m (x5 over 23m)      kubelet            Created container controller
  Warning  BackOff    3m57s (x100 over 23m)  kubelet            Back-off restarting failed container
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Name:                     cluster-ingress-ingress-nginx-controller
Namespace:                cluster-ingress
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=cluster-ingress
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.5.1
                          argocd.argoproj.io/instance=cluster-ingress
                          helm.sh/chart=ingress-nginx-4.4.2
                          k8s.scaleway.com/cluster=6667679f-c59f-4f99-a1ae-b9dec7a410aa
                          k8s.scaleway.com/kapsule=
                          k8s.scaleway.com/managed-by-scaleway-cloud-controller-manager=
Annotations:              service.beta.kubernetes.io/scw-loadbalancer-id: fr-par-1/ec92f1d7-6323-42aa-b8b2-7e1bcdccb0d2
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=cluster-ingress,app.kubernetes.io/name=ingress-nginx
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.35.34.238
IPs:                      10.35.34.238
External IPs:             <redacted>
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31196/TCP
Endpoints:
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31838/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
  • Current state of ingress object, if applicable:

    • kubectl -n <appnnamespace> get all,ing -o wide
    • kubectl -n <appnamespace> describe ing <ingressname>
    • If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:

  • Create an application (e.g. nginx) and service in one namespace, look up it's service IP (which wouldn't have to be done if ExternalName didn't completely crash the cluster because the ingress-controller eats resources instead of just doing the lookup).
  • Create a headless service and EndpointSlice in another namespace, pointing at the IP of the first service.
  • Maybe restart nginx ingress controller to trigger the panic.
@iMartyn iMartyn added the kind/bug Categorizes issue or PR as related to a bug. label Feb 11, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 11, 2023
@longwuyuan
Copy link
Contributor

longwuyuan commented Feb 11, 2023

@iMartyn the controller pod is in a crashloopbackoff and the service type is NodePort and the ArgoCD config shows hostnetwork. These are not default configurations. Does it work if you do a install as per the documented procedures and not use hostnetwork or service type NodePort.
/remove-kind bug

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 11, 2023
@iMartyn
Copy link
Author

iMartyn commented Feb 11, 2023

The controller pod is in crashloop backoff because of the issue in question. The controller works perfectly if I do not have a headless service. I knew you would try and dismiss this as a configuration error which is why I dutifully gave all the completely irellevant information. Please read the issue again.

@iMartyn
Copy link
Author

iMartyn commented Feb 11, 2023

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 11, 2023
@longwuyuan
Copy link
Contributor

I edited my comments in the previous post. Sorry for incorrect comments (now edited) earlier.

@iMartyn
Copy link
Author

iMartyn commented Feb 11, 2023

There's no world where the hostnetwork setup would cause a crash when interacting with the api when the controller works normally for other services, so whilst I have not tested this, it would be a waste of time to do so. Either the controller can read from the k8s api or it cannot. If it couldn't, then it would not work for other service types. I have had to move everything into the same namespace and point at a normal service, and this is working, so the controller is simply not dealing with headless services correctly.

@longwuyuan
Copy link
Contributor

I don't think the CI tests against headless service. But before diving into that, can you post kubectl -n jellyfin describe svc jellyfin

@longwuyuan
Copy link
Contributor

longwuyuan commented Feb 11, 2023

And "YES", there was a change to use endpointslices which means you can also test and report status on using a version of the controller before the endpointslices implementation #8890

@iMartyn
Copy link
Author

iMartyn commented Feb 11, 2023

I cannot currently retest this because to do so takes my whole cluster offline. The service was working with port-forward and from other pods.

@longwuyuan
Copy link
Contributor

kubectl -n jellyfin describe svc jellyfin

@iMartyn
Copy link
Author

iMartyn commented Feb 11, 2023

again, my system isn't in the state where I can do this now. I had to fix my system by moving everything into one namespace so I cannot give you the output of that. I can tell you the service was working with both port-forward and from other pods by name.

@strongjz
Copy link
Member

I wonder if its related to this issue #9550

@longwuyuan
Copy link
Contributor

longwuyuan commented Feb 13, 2023

@iMartyn were you using a service of --type externalName or just with a selector

@iMartyn
Copy link
Author

iMartyn commented Feb 13, 2023

No, ExternalName kills clusters using the controller, because it can't resolve externalNames (I experienced this previously, hence creating the headless service. The headless service contains no selector because it is referring to IPs in the other namespace. That's why you have to create the endpointslices manually.

@longwuyuan
Copy link
Contributor

@iMartyn are the namespace of the ingress and the namespace of the backend service the ingress routes to, are different

@iMartyn
Copy link
Author

iMartyn commented Feb 13, 2023

This is so frustrating.... I've already given this information in the initial report.
Namespace A : pod running and service that works
Namespace B : Headless service and endpointSlice pointing at the IP of the service in namespace A, and ingress pointing at the service (in ns B).

The whole point of this is for separation.

@longwuyuan
Copy link
Contributor

longwuyuan commented Feb 13, 2023

Community folks intend to move towards a resolution volunteering on their free time so frustration is not a great direction to head into, as this is different from a paid support type of engagement (to state the obvious).

The reason to ask about namespace is lack of explicit clarification on ingress as well as backend-service being in the same namespace , as per spec.

% k explain ingress.spec.rules.http.paths.backend.service
KIND:     Ingress
VERSION:  networking.k8s.io/v1

RESOURCE: service <Object>

DESCRIPTION:
     Service references a Service as a Backend. This is a mutually exclusive
     setting with "Resource".

     IngressServiceBackend references a Kubernetes Service as a Backend.

FIELDS:
   name <string> -required-
     Name is the referenced service. The service must exist in the same
     namespace as the Ingress object.

   port <Object>
     Port of the referenced service. A port name or port number is required for
     a IngressServiceBackend.

And manually creating endpointslices is also not a normal use case. So fleshing the issue with explicit clear description of problem helps to complete the triage.

@schoentoon
Copy link
Contributor

@strongjz I could see it being related yes, the stack trace seems to point to the same thing at least.
If OP could kubectl describe endpointslice so we could see whether the Ready condition is set or not, that would be useful to determine whether it's related or not.

@longwuyuan
Copy link
Contributor

@schoentoon were you trying to setup a endpointslice manually that pointed to an ipaddress from a different namespace, as in the ingress was in one namespace and the endpointslice was intended to contain a ipddress not from the same namespace

@schoentoon
Copy link
Contributor

schoentoon commented Feb 15, 2023

@longwuyuan kind of. I was manually creating it pointing it to an ip address outside of the cluster entirely. Ironically similar to OP, to the host where I run my Jellyfin

@tombokombo
Copy link
Contributor

@iMartyn could you please share endpointsclice yaml and have I understand correctly, that you crafted endpointslice by hand?

@bmv126
Copy link

bmv126 commented Feb 21, 2023

Are we expecting this issue to be fixed by #9550

@strongjz
Copy link
Member

Headless service without selectors does not create endpointslices, so we should validate that and stop it in the admission controller since ingress-nginx uses endpoints to identify the backends. https://kubernetes.github.io/ingress-nginx/user-guide/miscellaneous/#why-endpoints-and-not-services

https://kubernetes.io/docs/concepts/services-networking/service/#without-selectors

How you configured the ingress and the service caused the panic; there are no endpoint slices since Jellyfin Service has no selectors. So ingress is working fine, minus the pointer issue. The endpoint ready point was nil since no endpoints are ready to serve traffic.

As far as the panic, I do believe that it is fixed in #9550

We need to add the admission controller a check for this, and for security reasons, we need to add a config option to allow cross-namespace talk like this. Namespaces are used to segment the cluster for multiple reasons. Allowing one service in one namespace to attempt to use endpoints for a service in another namespace is a security concern that should be explicitly allowed.

@iMartyn
Copy link
Author

iMartyn commented Feb 22, 2023

I did create EndpointSlices manually, but until the panic is resolved I can't retest this because it will bring down the entire cluster.

As I have stated multiple times, the services were ready and accepting traffic from other pods and using kubectl port-forward.

Once there's a release containing #9550 I will see if I can recreate the situation on that version.

@strongjz
Copy link
Member

This is still a security concern; how does ingress know you should have access to those endpoints?

Service A controlled by Team A
Endpoints B point to A created by Team B or even a malicious actor.

If an admin allows this, fine.

Network policies might be the actual answer here, though.

Maybe this is out of the purview of ingress?

@github-actions
Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 25, 2023
@redbaron
Copy link

Maybe this is out of the purview of ingress?

Yes, K8S in general dealt with it by controlling access to EndpointSlices with RBAC: kubernetes/kubernetes#103675

@longwuyuan
Copy link
Contributor

Hi,

Reading this again after so many months and wanting to take it further.

Even though there has been discussions, there is a fundamental aspect that needs to be stated as the obvious. The Ingress-API spec wants the backed destination to be in the same namespace as the ingress. The intricate use of endpointslices or RBAC or serviceaccount etc etc are all geared to work with this tenet.

Since the final goal as described in the original issue description is to traverse namespaces, there are no resources available in the project to work on that kind of functionality. The shortage of resources has required the project to deprecate features that are far away from implications of the Ingress-API as the focus is to secure the controller by default as well as implement the Gateway-API.

Although there is use of service type externalName by some users, that by itself has its own complciations. This is one example of a feature that users have option for but its too far from the Ingress-API for the project to allocate resources for. Similarly headless-service or other type of complicated use for cross namespace routing of external traffic to internal workloads is not something that the project has resources to allocate to.

As such this issue is adding to the tally of open issues without tracking any real action item. Hence I will close the issue for now.

/close

@k8s-ci-robot
Copy link
Contributor

@longwuyuan: Closing this issue.

In response to this:

Hi,

Reading this again after so many months and wanting to take it further.

Even though there has been discussions, there is a fundamental aspect that needs to be stated as the obvious. The Ingress-API spec wants the backed destination to be in the same namespace as the ingress. The intricate use of endpointslices or RBAC or serviceaccount etc etc are all geared to work with this tenet.

Since the final goal as described in the original issue description is to traverse namespaces, there are no resources available in the project to work on that kind of functionality. The shortage of resources has required the project to deprecate features that are far away from implications of the Ingress-API as the focus is to secure the controller by default as well as implement the Gateway-API.

Although there is use of service type externalName by some users, that by itself has its own complciations. This is one example of a feature that users have option for but its too far from the Ingress-API for the project to allocate resources for. Similarly headless-service or other type of complicated use for cross namespace routing of external traffic to internal workloads is not something that the project has resources to allocate to.

As such this issue is adding to the tally of open issues without tracking any real action item. Hence I will close the issue for now.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

8 participants