Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kubernetes EndpointSlices #6017

Closed
raravena80 opened this issue Aug 13, 2020 · 30 comments · Fixed by #8890
Closed

Support Kubernetes EndpointSlices #6017

raravena80 opened this issue Aug 13, 2020 · 30 comments · Fixed by #8890
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@raravena80
Copy link

raravena80 commented Aug 13, 2020

Support Kubernetes EndpointSlices. A newer feature in Kubernetes that allows restricting or customizing where traffic is sent in a Kubernetes cluster.

Background:

https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Not that I know of

K8s 1.17 and above (Beta): https://kubernetes.io/docs/concepts/services-networking/endpoint-slices

/kind feature

@raravena80 raravena80 added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 13, 2020
@aledbf
Copy link
Member

aledbf commented Aug 13, 2020

@raravena80 what are you trying to do exactly?

@aledbf
Copy link
Member

aledbf commented Aug 13, 2020

K8s 1.17 and above (Beta): https://kubernetes.io/docs/concepts/services-networking/endpoint-slices

Right, but for some context, the majority of the users are still running k8s < 1.16, even 1.13.
Adding a feature like this one only adds complexity to the project.

Without a clear problem this feature could solve, I don't see the reason to add support, at least until users run k8s > 1.17

@raravena80
Copy link
Author

@aledbf this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Thanks!

@aledbf
Copy link
Member

aledbf commented Aug 13, 2020

this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Interesting.

The question itself, about service topology, can be solved using the annotation service-upstream
That said, the source of the connection will be ingress-nginx, delegating the topology part to the k8s service topology feature (topologyKeys). But then you cannot have custom LB algorithms or sticky sessions.

The EndpointSlices part makes sense when you have services will lot of endpoints (> 100).

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2020
@raravena80
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2020
@ltagliamonte-dd
Copy link

Endpoints slices are game changer not only for the scalability benefits they bring for services with a lot of endpoints, they also bring performance improvements and cost savings in cloud environment like aws.

It is possible to group endpoints per availability zone a based on the identity of the nginx pod you can prefer the endpoints in your zone instead of the others across zone.
This saves you money and boost perfs because of the traffic staying in the same avz.

@ecktom
Copy link

ecktom commented Jan 11, 2021

this is based on this Stackoverflow question: https://stackoverflow.com/questions/63399080/kubernetes-1-18-6-servicetopology-and-ingress-support

Interesting.

The question itself, about service topology, can be solved using the annotation service-upstream
That said, the source of the connection will be ingress-nginx, delegating the topology part to the k8s service topology feature (topologyKeys). But then you cannot have custom LB algorithms or sticky sessions.

The EndpointSlices part makes sense when you have services will lot of endpoints (> 100).

Please correct me @aledbf, but I believe it would make sense to consider endpoint slices and topology aware routing in this project as well. K8s services are kind of difficult to use in a HTTP/2 context eg. when using gRPC due to it's connection reuse/multiplexing.
There is the possibility to use a headless service and DNS based client load balancing but this also comes with some issues eg. getting notice of new pods (can be possibility by lower the TTL).
The only clean solution here is working on endpoints directly. On the client side, there is a project for this https://github.com/sercand/kuberesolver, even though it does not have support for endpoint slices and topology aware routing yet.

So if we wan't to have a topology aware routing (which does make sense for many cases, especially cost reduction in a multi AZ environment) for HTTP/2 we might need to include some logic working on endpoint slices and certain routing preferences.

See also:
zalando/skipper#1446
linkerd/linkerd2#4780

@aledbf
Copy link
Member

aledbf commented Jan 11, 2021

So if we wan't to have a topology-aware routing (which does make sense for many cases, especially cost reduction in a multi AZ environment) for HTTP/2 we might need to include some logic working on endpoint slices and certain routing preferences.

We have a KEP to add support for zone aware routing but such a feature requires massive changes in the lua side of the controller.

Using topology-aware routing (from k8s) means you lose several features from ingress-nginx, like sticky sessions, due to the use of the k8s service abstraction instead of endpoints.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2021
@raravena80
Copy link
Author

/remove-lifecycle stale.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 11, 2021
@raravena80
Copy link
Author

/remove-lifecycle rotten.

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ltagliamonte-dd
Copy link

Let's not forget that Endpoints resources are going to be deprecated very soon.

@tosi3k
Copy link
Member

tosi3k commented Jun 20, 2022

/reopen

This is still not fixed and one can hit K8s control plane availability problems when there's a high churn on large services in the cluster and lots of ingress-nginx-controller replicas - apiserver needs to send notifications about endpoints changes to lots of watchers which often ends up with its overload.

@k8s-ci-robot
Copy link
Contributor

@tosi3k: Reopened this issue.

In response to this:

/reopen

This is still not fixed and one can hit K8s control plane availability problems when there's a high churn on large services in the cluster and lots of ingress-nginx-controller replicas - apiserver needs to send notifications about endpoints changes to lots of watchers which often ends up with its overload.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jun 20, 2022
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Jun 20, 2022
@ottoyiu
Copy link

ottoyiu commented Jun 29, 2022

The lack of EndpointSlices implementation is unfortunately impacting production for us now. Since Kubernetes v1.22, Services that exceed 1000 Pods/network endpoints, Endpoints are now being truncated to a maximum of 1000 items.

@strongjz
Copy link
Member

strongjz commented Jul 7, 2022

/priority backlog
/triage accepted
/project Stabilization Project

@k8s-ci-robot
Copy link
Contributor

@strongjz: You must be a member of the kubernetes/ingress-nginx github team to set the project and column.

In response to this:

/priority backlog
/triage accepted
/project Stabilization Project

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 7, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tosi3k
Copy link
Member

tosi3k commented Aug 8, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@tosi3k: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Aug 8, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tosi3k
Copy link
Member

tosi3k commented Sep 7, 2022

/reopen
/remove-lifecycle rotten
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot reopened this Sep 7, 2022
@k8s-ci-robot
Copy link
Contributor

@tosi3k: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 7, 2022
@strongjz
Copy link
Member

strongjz commented Sep 7, 2022

#8890 is currently working on this feature

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants