diff --git a/keps/sig-network/3786-migrate-shared-kube-proxy-code-to-staging/README.md b/keps/sig-network/3786-migrate-shared-kube-proxy-code-to-staging/README.md new file mode 100644 index 00000000000..dfade5b024c --- /dev/null +++ b/keps/sig-network/3786-migrate-shared-kube-proxy-code-to-staging/README.md @@ -0,0 +1,301 @@ + +# KEP-3786: Migrate Shared kube-proxy Code into Staging +# Index + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +After about a year and a half of testing a [new service proxy implementation](https://github.com/kubernetes-sigs/kpng/), and +collecting sig-network and community feedback, it is clear that a shared library (referred to as the kube-proxy staging library in this document) designed to make building new service proxies easier is needed. Specifically, it is desired by members of the Kubernetes networking community who are attempting to build specialized networking tools. However, in order to prevent the community from having to maintain +two separate sets of code (the existing kube-proxy code and the aforementioned new library) while also ensuring the stability of the existing proxies, it makes sense to work incrementally towards this goal in the kube-proxy +staging library. + +This KEP describes the **first step** towards the ultimate end goal of providing +a shared library of core kube-proxy functionality which can be easily consumed by the community. +Specifically, it will work to move the existing [shared kube-proxy code](https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy) and associated network utilities into [staging](https://github.com/kubernetes/kube-proxy). + +This initial work will open the door for future kube-proxy code improvements to be made in both +a safe an incremental way. + +## Motivation + +There have been several presentations, issues, and projects dedicated to reusing kube-proxy logic while extending it to embrace +different backend technologies (i.e. nftables, eBPF, Open vSwitch, and so on). This KEP attempts to work towards making a library which will facilitate +this type of work ultimately making it much easier to write and maintain new proxy implementations. + +A previous attempt at a broad solution to this problem was explored in the [KPNG project](https://github.com/kubernetes-sigs/kpng/), which exhibits many properties that allow for such goals to be accomplished. However, because it introduced many new features and would result in large breaking changes if it were +to be incorporated back in tree, it became clear we needed to decompose this large task into smaller pieces. Therefore, we've decided to keep things simple and start by moving the existing shared kube-proxy code into staging where it can be iterated on and augmented to in an safe, consumable and incremental manner. + +### Goals + +- Move the [shared kube-proxy code k/k/pkg/proxy](https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy), and relevant +networking utilities (i.e `pkg/util/iptables`) into the kube-proxy [staging repository](https://github.com/kubernetes/kube-proxy). +- Ensure all existing kube-proxy unit and e2e coverage remains the same or is improved. +- Ensure the shared code can be easily vendored by external users to help write new out of tree service proxies. +- Write documentation detailing how external consumers can utilize the kube-proxy staging library. + +### Non-Goals + +- Write any more new "in-tree" service proxies. +- Make any incompatible architectural or deployment changes to the existing kube-proxy implementations. +- Tackle any complex new deployment scenarios (This is solved by [KPNG](https://github.com/kubernetes-sigs/kpng/)) + +## Proposal + +We propose to build a new library in the [kube-proxy staging repository](https://github.com/kubernetes/kube-proxy). This repository will be vendored by the core implementations and developers who want to build new service proxy implementations, providing them with: + +- A vendorable golang library that defines a few interfaces which can be easily implemented by a new service proxy, that responds to EndpointSlice and Service changes. +- Documentation on how to build a kube proxy with the library, based on [So You Want To Write A Service Proxy...](https://github.com/kubernetes-sigs/kpng/blob/master/doc/service-proxy.md) and other similar resources. + +Not only will this make writing new backends easier, but through incremental changes and optimizations to the new library we hope to also improve the existing proxies, making [legacy bugs](https://github.com/kubernetes/kubernetes/issues/112604) easier to fix in the future. + +### User Stories (Optional) + +#### Story 1 + +As a networking technology startup I want to easily make a new service proxy implementation without maintaining the logic of talking to the APIServer, caching its data, or calculating an abbreviated/proxy-focused representation of the Kubernetes networking state space. I'd like a wholesale framework I can simply plug my custom dataplane oriented logic into. + +#### Story 2 + +As a service proxy maintainer, I don't want to have to understand the in-tree internals of a networking backend in order to simulate or write core updates to the logic of the kube-proxy locally. + +#### Story 3 + +As a Kubernetes developer I want to make maintaining the shared proxy code easier, and allow for updates to that code to be completed in a more incremental and well tested way. + +### Notes/Constraints/Caveats (Optional) + +TBD + +### Risks and Mitigations + +Since this KEP involves the movement of core code bits there are some obvious risks, however they will be mitigated by ensuring +all existing unit and e2e test coverage is kept stable and/or improved throughout the process. + +## Design Details + +**NOTE: This section is still under active development please comment with any further ideas** + +The implementation of this kep will begin by moving the various networking utilities (i.e `pkg/util/iptables`, `pkg/util/conntrack`, +`pkg/util/ipset`, `pkg/util/ipvs`) used by `pkg/proxy` to the staging repo using @danwinship [previous attempt](https://github.com/kubernetes/kubernetes/pull/112886) as a guide. Additionally we will need to [re-look](https://github.com/kubernetes/utils/pull/165) into moving `pkg/util/async` out of `pkg/util/async` and into [`k8s/utils`](https://github.com/kubernetes/utils). + +Following this initial work, the [shared kube-proxy code](https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy) as it stands today will be moved into the kube-proxy staging repo. Throughout this process it's crucial that all unit and e2e test coverage +is either maintained or improved to ensure stability of the existing in-tree proxies. + +In conclusion, documentation will be written to help users consume the now vendorable kube-proxy code. + +Additional steps (possibly described in further detail with a follow-up kep) will include: + +- Building up more tooling around testing and use of the library for external consumers. +- Analysis of possible improvements and updates to the shared library code, using the POC done in [KPNG](https://github.com/kubernetes-sigs/kpng) as a reference, to make writing new out of tree service proxy implementations easier. + +### Test Plan + +##### Unit tests + +All existing kube-proxy and associated library unit test coverage **MUST** be maintained or improved. + +##### Integration tests + +All existing kube-proxy and associated library integration test coverage **MUST** be maintained or improved. + +##### e2e tests + +All existing kube-proxy and associated library e2e test coverage **MUST** be maintained or improved. + + +### Graduation Criteria + +N/A + +## Production Readiness Review Questionnaire + +### Dependencies + +N/A + +### Scalability + +N/A the core functionality will remain the same + +###### Will enabling / using this feature result in any new API calls? + +No + +###### Will enabling / using this feature result in introducing new API types? + +No, it wont result in new K8s APIs. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + +No + +### Troubleshooting + +###### How does this feature react if the API server and/or etcd is unavailable? + +The APIServer going down will prevent this library from generally working as would be expected in normal cases, where all incoming +Kubernetes networking data is being polled from the APIServer. + +###### What are other known failure modes? + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + +- [Add kep to move kubeproxy out of tree (to staging)](https://github.com/kubernetes/enhancements/pull/2635) +- ["Move kube-proxy networking utilities to staging"](https://github.com/kubernetes/kubernetes/pull/112886) +- [Import BoundedFrequencyRunner from k8s.io/kubernetes](https://github.com/kubernetes/utils/pull/165) +- ["Librarification" PR into KPNG](https://github.com/kubernetes-sigs/kpng/pull/389). + +## Drawbacks + +## Alternatives + +We could retain the existing kube proxy shared code in core kubernetes and simply better document the data structures and golang maps used for kubernetes client operations and client side caching. However, that would still require external proxy implementations to copy and paste large amounts of code. The other option is to not tackle this problem in-tree and to move forward with the singular development of external projects like KPNG as the overall framework for solving these problems. The drawbacks to this include the large additional maintenance burden, and that it is opinionated towards a raw GRPC implementation and other users (i.e. XDS) want something more decoupled possibly. This realization has inspired this KEP. + +## Infrastructure Needed (Optional) + +None \ No newline at end of file diff --git a/keps/sig-network/3786-migrate-shared-kube-proxy-code-to-staging/kep.yaml b/keps/sig-network/3786-migrate-shared-kube-proxy-code-to-staging/kep.yaml new file mode 100644 index 00000000000..68eec3c08b5 --- /dev/null +++ b/keps/sig-network/3786-migrate-shared-kube-proxy-code-to-staging/kep.yaml @@ -0,0 +1,59 @@ +title: rework kube-proxy architecture +kep-number: 3786 +authors: + - "@mcluseau" + - "@rajaskakodar" + - "@astoycos" + - "@jayunit100" + - "@nehalohia" +owning-sig: sig-network +participating-sigs: + - sig-network +status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced +creation-date: 2020-10-10 +reviewers: + - "@thockin" + - "@danwinship" +approvers: + - "@thockin" + - "@danwinship" + +##### WARNING !!! ###### +# prr-approvers has been moved to its own location +# You should create your own in keps/prod-readiness +# Please make a copy of keps/prod-readiness/template/nnnn.yaml +# to keps/prod-readiness/sig-xxxxx/00000.yaml (replace with kep number) +#prr-approvers: + +see-also: + - https://github.com/kubernetes/enhancements/pull/2094 + - https://github.com/kubernetes/enhancements/pull/3649 + +# The target maturity stage in the current dev cycle for this KEP. +stage: stable + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.28" + +# The milestone at which this feature was, or is targeted to be, at each stage. +# (astoycos) This is an internal code change it does not need to follow the release milestones +# milestone: +# alpha: "v1.19" +# beta: "v1.20" +# stable: "v1.22" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +# (astoycos) This is an internal code change it does not need to use feature-gates +# feature-gates: +# - name: MyFeature +# components: +# - kube-apiserver +# - kube-controller-manager +# disable-supported: true + +# The following PRR answers are required at beta release +# metrics: +# - my_feature_metric