Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpointslice detection fails if servicename is too long #9240

Closed
WolfspiritM opened this issue Nov 1, 2022 · 3 comments · Fixed by #9245
Closed

Endpointslice detection fails if servicename is too long #9240

WolfspiritM opened this issue Nov 1, 2022 · 3 comments · Fixed by #9245
Assignees
Labels
area/stabilization Work for increasing stabilization of the ingress-nginx codebase kind/bug Categorizes issue or PR as related to a bug. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@WolfspiritM
Copy link

What happened:
Since updating to version 1.4.0 services with long names don't work anymore with nginx-ingress. Accessing the service results in a 503 error and the error I get is:

W1028 10:07:44.588930       7 endpointslices.go:81] Error obtaining Endpoints for Service "vcluster-dev-bbbbbb/dev-bbbbbb-product-manual-x-dev-bbbbbb-product-manua-30210cf06f": no object matching key "vcluster-dev-bbbbbb/dev-bbbbbb-product-manual-x-dev-bbbbbb-product-manua-30210cf06f" in local store
W1028 10:07:44.588956       7 controller.go:1112] Service "vcluster-dev-bbbbbb/dev-bbbbbb-product-manual-x-dev-bbbbbb-product-manua-30210cf06f" does not have any active Endpoint.

Moving back to previous version makes everything work again.
Especially with vcluster this is a big issue as vcluster generates long names per default.
See also:
loft-sh/vcluster#800

What you expected to happen:
No 503 error but successful proxy to the endpoint/pod.

The reason seems to be the new Endpointslice detection here

for _, listKey := range s.ListKeys() {
if !strings.HasPrefix(listKey, key) {
continue
}

It seems like it match by "prefix" first and then check if the label matches.

If the service name is long (I guess around 55-60 chars makes it start) then it seems like it gets truncated with a value in the end and the prefix check doesn't work anymore see:

# kubectl get services -n mail
NAME                                                             TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
mail-backend-http-test-http-test-http-test-http-test-http-test   ClusterIP   10.43.6.144   <none>        80/TCP    4m48s
# kubectl get ep -n mail
NAME                                                             ENDPOINTS         AGE
mail-backend-http-test-http-test-http-test-http-test-http-test   10.42.26.212:80   4m51s
# kubectl get endpointslices -n mail
NAME                                                              ADDRESSTYPE   PORTS   ENDPOINTS      AGE
mail-backend-http-test-http-test-http-test-http-test-http-vhk88   IPv4          80      10.42.26.212   4m52s
PS C:\Users\Wolfspirit>

mail-backend-http-test-http-test-http-test-http-test-http-vhk88 is NOT a prefix of mail-backend-http-test-http-test-http-test-http-test-http-test but it does have the correct kubernetes.io/service-name label.

# kubectl describe endpointslices -n mail
Name:         mail-backend-http-test-http-test-http-test-http-test-http-vhk88
Namespace:    mail
Labels:       
              endpointslice.kubernetes.io/managed-by=endpointslice-controller.k8s.io
              kubernetes.io/service-name=mail-backend-http-test-http-test-http-test-http-test-http-test
Annotations:  <none>
AddressType:  IPv4
Ports:
  Name                                               Port  Protocol
  ----                                               ----  --------
  http-test-http-test-http-test-http-test-http-test  80    TCP
Endpoints:
  - Addresses:  10.42.26.212
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/mail-backend-79d4d445f4-8g7lj
    NodeName:   mile-agent-small-zjg
    Zone:       nbg1-dc3
Events:         <none>

This results in

W1101 00:31:29.424118       7 endpointslices.go:81] Error obtaining Endpoints for Service "mail/mail-backend-http-test-http-test-http-test-http-test-http-test": no object matching key "mail/mail-backend-http-test-http-test-http-test-http-test-http-test" in local store
W1101 00:31:29.424131       7 controller.go:1112] Service "mail/mail-backend-http-test-http-test-http-test-http-test-http-test" does not have any active Endpoint.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller
Release: v1.4.0
Build: 50be2bf
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.7", GitCommit:"e6f35974b08862a23e7f4aad8e5d7f7f2de26c15", GitTreeState:"clean", BuildDate:"2022-10-12T10:57:14Z", GoVersion:"go1.18.7", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.7+k3s1", GitCommit:"7af5b16788afe9ce1718d7b75b35eafac7454705", GitTreeState:"clean", BuildDate:"2022-10-25T19:31:34Z", GoVersion:"go1.18.7", Compiler:"gc", Platform:"linux/amd64"}

How was the ingress-nginx-controller installed:
Helm:
ngx kube-system 13 2022-11-01 00:18:32.613367459 +0000 UTC deployed ingress-nginx-4.3.0 1.4.0

How to reproduce this issue:

  1. Use a service with a 63 char long name.
  2. Try to access it with an ingress.
  3. Error 503
@WolfspiritM WolfspiritM added the kind/bug Categorizes issue or PR as related to a bug. label Nov 1, 2022
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Nov 1, 2022
@longwuyuan
Copy link
Contributor

/triage accepted
/area stabilization

cc @tao12345666333 @rikatz

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. area/stabilization Work for increasing stabilization of the ingress-nginx codebase and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 1, 2022
@tao12345666333
Copy link
Member

/assign

Let me pick up it

@tombokombo
Copy link
Contributor

That HasPrefix part is just optimization, to get rid of not matching prefixes first. It's there because we need to loop trough all slices and checking if endpointslice name has svc name as prefix. Later on, there is real check for ownership on slice labels.
As a solution we could strip both sides to equal number of characters before checking prefix.
@tao12345666333 I will provide fix, by striping to same lenght in HasPrefix, ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/stabilization Work for increasing stabilization of the ingress-nginx codebase kind/bug Categorizes issue or PR as related to a bug. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants