diff --git a/keps/prod-readiness/sig-api-machinery/3352.yaml b/keps/prod-readiness/sig-api-machinery/3352.yaml index 6b204c391767..caffcbade463 100644 --- a/keps/prod-readiness/sig-api-machinery/3352.yaml +++ b/keps/prod-readiness/sig-api-machinery/3352.yaml @@ -1,5 +1,3 @@ kep-number: 3352 alpha: approver: "@deads2k" -beta: - approver: "@deads2k" diff --git a/keps/sig-api-machinery/3352-aggregated-discovery/README.md b/keps/sig-api-machinery/3352-aggregated-discovery/README.md index cec79e907cf8..73ccf19114dc 100644 --- a/keps/sig-api-machinery/3352-aggregated-discovery/README.md +++ b/keps/sig-api-machinery/3352-aggregated-discovery/README.md @@ -1,80 +1,74 @@ - +- [ ] **Fill out this file as best you can.** At minimum, you should + fill in the "Summary" and "Motivation" sections. These should be + easy if you've preflighted the idea of the KEP with the appropriate + SIG(s). +- [ ] **Create a PR for this KEP.** Assign it to people in the SIG who + are sponsoring this process. +- [ ] **Merge early and iterate.** Avoid getting hung up on specific + details and instead aim to get the goals of the KEP clarified and + merged quickly. The best way to do this is to just start with the + high-level sections and fill out details incrementally in subsequent + PRs. + +Just because a KEP is merged does not mean it is complete or approved. +Any KEP marked as `provisional` is a working document and subject to +change. You can denote sections that are under active debate as +follows: + +``` <<[UNRESOLVED optional short context or usernames ]>> Stuff that +is being argued. <<[/UNRESOLVED]>> ``` + +When editing KEPS, aim for tightly-scoped, single-topic PRs to keep +discussions focused. If you disagree with what is already in a +document, open a new PR with suggested changes. + +One KEP corresponds to one "feature" or "enhancement" for its whole +lifecycle. You do not need a new KEP to move from beta to GA, for +example. If new details emerge that belong in the KEP, edit the KEP. +Once a feature has become "implemented", major changes should get new +KEPs. + +The canonical place for the latest set of instructions (and the likely +source of this file) is [here](/keps/NNNN-kep-template/README.md). + +**Note:** Any PRs to move a KEP to `implementable`, or significant +changes once it is marked `implementable`, must be approved by each of +the KEP approvers. If none of those approvers are still appropriate, +then changes to that list should be approved by the remaining +approvers and/or the owning SIG (or SIG Architecture for cross-cutting +KEPs). --> # KEP-3352: Aggregated Discovery - + - +Ensure the TOC is wrapped with <!-- toc --&rt;<!-- /toc + --&rt; tags, and then generate with `hack/update-toc.sh`. --> - [Release Signoff Checklist](#release-signoff-checklist) @@ -115,40 +109,52 @@ tags, and then generate with `hack/update-toc.sh`. ## Release Signoff Checklist - +Check these off as they are completed for the Release Team to track. +These checklist items _must_ be updated for the enhancement to be +released. --> -Items marked with (R) are required *prior to targeting to a milestone / release*. +Items marked with (R) are required *prior to targeting to a milestone +/ release*. -- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) -- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Enhancement issue in release milestone, which links to KEP + dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as + `implementable` - [ ] (R) Design details are appropriately documented -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) +- [ ] (R) Test plan is in place, giving consideration to SIG + Architecture and SIG Testing input (including test refactors) - [ ] e2e Tests for all Beta API Operations (endpoints) - - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance + Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake + free - [ ] (R) Graduation criteria is in place - - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) [all GA + Endpoints](https://github.com/kubernetes/community/pull/1806) + must be hit by [Conformance + Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Production readiness review completed - [ ] (R) Production readiness review approved - [ ] "Implementation History" section is up-to-date for milestone -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] -- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes +- [ ] User-facing documentation has been created in + [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, + links to mailing list discussions/SIG meetings, relevant + PRs/issues, release notes - + [kubernetes.io]: https://kubernetes.io/ [kubernetes/enhancements]: https://git.k8s.io/enhancements @@ -157,82 +163,108 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary - -## Motivation - - -All clients and users of Kubernetes APIs usually first need to “discover” what the available APIs are and how they can be used. These APIs are described through a mechanism called “Discovery” which is typically queried to then build the requests to correct APIs. Unfortunately, the “Discovery” API is made of lots of small objects that need to be queried individually, causing possibly a lot of delay due to the latency of each individual request (up to 80 requests, with most objects being less than 1,024 bytes). The more complex the APIs provided by the Kubernetes cluster, the more requests need to be performed. - -The most well known Kubernetes client that uses the discovery mechanism is `kubectl`, and more specifically the `CachedDiscoveryClient` in `client-go`. To mitigate some of this latency, kubectl has implemented a (10 minute timer) 6 hours(!) timer during which the discovery API is not refreshed. The drawback of this approach is that the freshness of the cache is doubtful and the entire discovery API needs to be refreshed after 6 hours, even if it hasn’t expired. +## Motivation -This not only impacts kubectl, but all clients of kubernetes. We can do better. + + +All clients and users of Kubernetes APIs usually first need to +“discover” what the available APIs are and how they can be used. These +APIs are described through a mechanism called “Discovery” which is +typically queried to then build the requests to correct APIs. +Unfortunately, the “Discovery” API is made of lots of small objects +that need to be queried individually, causing possibly a lot of delay +due to the latency of each individual request (up to 80 requests, with +most objects being less than 1,024 bytes). The more numerous the APIs +provided by the Kubernetes cluster, the more requests need to be +performed. + +The most well known Kubernetes client that uses the discovery +mechanism is `kubectl`, and more specifically the +`CachedDiscoveryClient` in `client-go`. To mitigate some of this +latency, kubectl has implemented a 6 hour timer during which the +discovery API is not refreshed. The drawback of this approach is that +the freshness of the cache is doubtful and the entire discovery API +needs to be refreshed after 6 hours, even if it hasn’t expired. + +This not only impacts kubectl, but all clients of kubernetes. We can +do better. ### Goals - Fix the discovery storm issue currently present in kubectl - Aggregate the discovery documents for all Kubernetes types - + ### Non-Goals - + -- Since the current discovery separated by group-version is already GA, removal of the endpoint will be difficult. This KEP will solely focus on introducing the new aggregated endpoint and will not cover deprecation. +Since the current discovery separated by group-version is already GA, +removal of the endpoint will not be attempted. This KEP will solely +focus on introducing the new aggregated endpoint. ## Proposal -We are proposing a new endpoint `/discovery` as an aggregated endpoint for all discovery documents. Discovery documents can currently be found under `apis//` and `/api/v1` for the legacy group version. This discovery endpoint will support publishing an ETag so clients who already have the latest version of the aggregated discovery can avoid redownloading the document. +We are proposing a new endpoint `/discovery/v1` as an aggregated +endpoint for all discovery documents. Discovery documents can +currently be found under `apis//` and `/api/v1` for +the legacy group version. This discovery endpoint will support +publishing an ETag so clients who already have the latest version of +the aggregated discovery can avoid redownloading the document. -We will add a new controller responsible for aggregating the discovery documents when a resource on the cluster changes. There will be no conflicts when aggregating since each discovery document is self-contained. +We will add a new controller responsible for aggregating the discovery +documents when a resource on the cluster changes. There will be no +conflicts when aggregating since each discovery document is +self-contained. ### Notes/Constraints/Caveats (Optional) - + ### Risks and Mitigations - + We will expose a endpoint `/discovery` that will support JSON, and -protobuf. The endpoint will serve the aggregated discovery document -for all types that a Kubernetes cluster supports. +protobuf and gzip compression. The endpoint will serve the aggregated +discovery document for all types that a Kubernetes cluster supports. ### API @@ -271,99 +302,178 @@ the new endpoint that we publish. The endpoint will also publish an ETag calculated based on a hash of the data for clients. +This is approximately what the new API will look like (conflicting names will be renamed) + +```go +// APIGroupList is a list of APIGroup, to allow clients to discover the API at +// /apis. +type APIGroupList struct { + TypeMeta `json:",inline"` + // groups is a list of APIGroup. + Groups []APIGroup `json:"groups" protobuf:"bytes,1,rep,name=groups"` +} + +// APIGroup contains the name, the supported versions, and the preferred version +// of a group. +type APIGroup struct { + TypeMeta `json:",inline"` + // name is the name of the group. + Name string `json:"name" protobuf:"bytes,1,opt,name=name"` + // versions are the versions supported in this group. + // This will be sorted in descending order based on the preferred version + Versions []GroupVersionForDiscovery `json:"versions" protobuf:"bytes,2,rep,name=versions"` + // PreferredVersion will be removed for the new Discovery API + // The GroupVersionForDiscovery will be sorted based on the preferred version + // PreferredVersion GroupVersionForDiscovery `json:"preferredVersion,omitempty" protobuf:"bytes,3,opt,name=preferredVersion"` + + // ServerAddresssByClientCIDRs will be removed for the new Discovery API + //ServerAddressByClientCIDRs []ServerAddressByClientCIDR `json:"serverAddressByClientCIDRs,omitempty" protobuf:"bytes,4,rep,name=serverAddressByClientCIDRs"` +} + +// GroupVersion contains the "group/version" and "version" string of a version. +// It is made a struct to keep extensibility. +type GroupVersionForDiscovery struct { + // groupVersion specifies the API group and version in the form "group/version" + GroupVersion string `json:"groupVersion" protobuf:"bytes,1,opt,name=groupVersion"` + // version specifies the version in the form of "version". This is to save + // the clients the trouble of splitting the GroupVersion. + Version string `json:"version" protobuf:"bytes,2,opt,name=version"` + // resources contains the name of the resources and if they are namespaced. + APIResources []APIResource `json:"resources" protobuf:"bytes,2,rep,name=resources"` +} + +// APIResource specifies the name of a resource and whether it is namespaced. +type APIResource struct { + // name is the plural name of the resource. + Name string `json:"name" protobuf:"bytes,1,opt,name=name"` + // singularName is the singular name of the resource. This allows clients to handle plural and singular opaquely. + // The singularName is more correct for reporting status on a single item and both singular and plural are allowed + // from the kubectl CLI interface. + SingularName string `json:"singularName" protobuf:"bytes,6,opt,name=singularName"` + // namespaced indicates if a resource is namespaced or not. + Namespaced bool `json:"namespaced" protobuf:"varint,2,opt,name=namespaced"` + // group is the preferred group of the resource. Empty implies the group of the containing resource list. + // For subresources, this may have a different value, for example: Scale". + Group string `json:"group,omitempty" protobuf:"bytes,8,opt,name=group"` + // version is the preferred version of the resource. Empty implies the version of the containing resource list + // For subresources, this may have a different value, for example: v1 (while inside a v1beta1 version of the core resource's group)". + Version string `json:"version,omitempty" protobuf:"bytes,9,opt,name=version"` + // kind is the kind for the resource (e.g. 'Foo' is the kind for a resource 'foo') + Kind string `json:"kind" protobuf:"bytes,3,opt,name=kind"` + // verbs is a list of supported kube verbs (this includes get, list, watch, create, + // update, patch, delete, deletecollection, and proxy) + Verbs Verbs `json:"verbs" protobuf:"bytes,4,opt,name=verbs"` + // shortNames is a list of suggested short names of the resource. + ShortNames []string `json:"shortNames,omitempty" protobuf:"bytes,5,rep,name=shortNames"` + // categories is a list of the grouped resources this resource belongs to (e.g. 'all') + Categories []string `json:"categories,omitempty" protobuf:"bytes,7,rep,name=categories"` + + // StorageVersionHash will be removed in the new Discovery API + // StorageVersionHash string `json:"storageVersionHash,omitempty" protobuf:"bytes,10,opt,name=storageVersionHash"` +} +``` + ### Aggregation -For the aggregation layer on the server, a new controller will be created to aggregate discovery for built-in types, apiextensions types (CRDs), and types from aggregated api servers. +For the aggregation layer on the server, a new controller will be +created to aggregate discovery for built-in types, apiextensions types +(CRDs), and types from aggregated api servers. + +A post start hook will be added and the kube-apiserver health check should only pass if the discovery document is ready. + ### Client -The `client-go` interface will be modified to add a new method to retrieve the aggregated discovery document and `kubectl` will be the initial candidate. As a starting point, `kubectl api-resources` should use the aggregated discovery document instead of sending a storm of requests. +The `client-go` interface will be modified to add a new method to +retrieve the aggregated discovery document and `kubectl` will be the +initial candidate. As a starting point, `kubectl api-resources` should +use the aggregated discovery document instead of sending a storm of +requests. ### Test Plan - +[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md --> -[x] I/we understand the owners of the involved components may require updates to -existing tests to make this code solid enough prior to committing the changes necessary -to implement this enhancement. +[x] I/we understand the owners of the involved components may require +updates to existing tests to make this code solid enough prior to +committing the changes necessary to implement this enhancement. ##### Prerequisite testing updates - + ##### Unit tests - - - + + This will be implemented in a new package in kube-aggregator. ##### Integration tests - +For Beta and GA, add links to added tests together with links to +k8s-triage for those tests: +https://storage.googleapis.com/k8s-triage/index.html --> -For alpha, integration tests will be added to exercise the new aggregated discovery code path. +For alpha, integration tests will be added to exercise the new +aggregated discovery code path. ##### e2e tests - +We expect no non-infra related flakes in the last month as a GA +graduation criteria. --> -For alpha, tests will be added to exercise the new aggregated discovery code path for kubectl, both on the server and client side. +For alpha, tests will be added to exercise the new aggregated +discovery code path for kubectl, both on the server and client side. ### Graduation Criteria - +Below are some examples to consider, in addition to the aforementioned +[maturity levels][maturity-levels]. --> #### Alpha - Feature implemented behind a feature flag - Initial e2e tests completed and enabled -- At least one client (kubectl) has an implementation to use the aggregated discovery feature +- At least one client (kubectl) has an implementation to use the + aggregated discovery feature #### Beta @@ -396,12 +507,12 @@ Below are some examples to consider, in addition to the aforementioned [maturity - TBD -**Note:** Generally we also wait at least two releases between beta and -GA/stable, because there's no opportunity for user feedback, or even bug reports, -in back-to-back releases. +**Note:** Generally we also wait at least two releases between beta +and GA/stable, because there's no opportunity for user feedback, or +even bug reports, in back-to-back releases. -**For non-optional features moving to GA, the graduation criteria must include -[conformance tests].** +**For non-optional features moving to GA, the graduation criteria must +include [conformance tests].** [conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md @@ -410,7 +521,8 @@ in back-to-back releases. ### Upgrade / Downgrade Strategy -Aggregated discovery will be behind a feature gate. It is an in-memory feature and upgrade/downgrade is not a problem. +Aggregated discovery will be behind a feature gate. It is an in-memory +feature and upgrade/downgrade is not a problem. ### Version Skew Strategy @@ -418,30 +530,30 @@ Aggregated discovery will be behind a feature gate. It is an in-memory feature a +[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) +channel if you need any help or guidance. --> ### Feature Enablement and Rollback - ###### How can this feature be enabled / disabled in a live cluster? @@ -456,303 +568,293 @@ No ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? - -Yes, the feature may be disabled by reverting the feature flag. +--> Yes, the feature may be disabled by reverting the feature flag. ###### What happens if we reenable the feature if it was previously rolled back? -The feature does not depend on state, and can be disabled/enabled at will. +The feature does not depend on state, and can be disabled/enabled at +will. ###### Are there any tests for feature enablement/disablement? - -n/a +A test will be added to ensure that the RESTMapper functionality works +properly both when the feature is enabled and disabled. ### Rollout, Upgrade and Rollback Planning - ###### How can a rollout or rollback fail? Can it impact already running workloads? - +feature flags will be enabled on some API servers and not others +during the rollout. Similarly, consider large clusters and how +enablement/disablement will rollout across nodes. --> ###### What specific metrics should inform a rollback? - + + +High latency or failure of a metric in the newly added discovery +aggregation controller. If the `/discovery/v1` endpoint is not +reachable or if there are errors from the endpoint, that could be a +sign of rollback as well. + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? - ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? - + ### Monitoring Requirements - +For GA, this section is required: approvers should be able to confirm +the previous answers based on experience in the field. --> ###### How can an operator determine if the feature is in use by workloads? - + ###### How can someone using this feature know that it is working for their instance? - + -- [ ] Events - - Event Reason: -- [ ] API .status - - Condition name: - - Other field: -- [ ] Other (treat as last resort) - - Details: +`/discovery/v1` endpoint is populated with discovery information, and all expected group-versions are present. ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? - +These goals will help you determine what you need to measure (SLIs) in +the next question. --> ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - + -- [ ] Metrics - - Metric name: - - [Optional] Aggregation method: - - Components exposing the metric: -- [ ] Other (treat as last resort) - - Details: +- [x] Metrics + - Metric name: `aggregator_discovery_aggregation_duration` + - Components exposing the metric: `kube-server` + - This is a metric for exposing the time it took to aggregate all the ###### Are there any missing metrics that would be useful to have to improve observability of this feature? - + + +Yes. A metric for the regeneration count of the discovery document. `aggregator_discovery_aggregation_count` ### Dependencies - ###### Does this feature depend on any specific services running in the cluster? - + - Impact of its degraded performance or high-error rates on the +feature: --> ### Scalability - +For GA, this section is required: approvers should be able to confirm +the previous answers based on experience in the field. --> ###### Will enabling / using this feature result in any new API calls? - + - API calls that may be triggered by changes of some Kubernetes + resources (e.g. update of object X triggers new updates of object + Y) + - periodic API calls to reconcile state (e.g. periodic fetching + state, heartbeats, leader election, etc.) --> ###### Will enabling / using this feature result in introducing new API types? - + - Supported number of objects per namespace (for namespace-scoped +objects) --> ###### Will enabling / using this feature result in any new calls to the cloud provider? - + - Estimated increase: --> ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - + - Estimated amount of new objects: (e.g., new Object X for every +existing Pod) --> ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? - +[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos --> ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? - +[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md --> ### Troubleshooting - ###### How does this feature react if the API server and/or etcd is unavailable? ###### What are other known failure modes? - + - Detection: How can it be detected via metrics? Stated another + way: how can an operator troubleshoot without logging into a + master or worker node? + - Mitigations: What can be done to stop the bleeding, especially + for already running user workloads? + - Diagnostics: What are the useful log messages and their required + logging levels that could help debug the issue? Not required + until feature graduated to beta. + - Testing: Are there any tests for failure mode? If not, describe +why. --> ###### What steps should be taken if SLOs are not being met to determine the problem? ## Implementation History - +- the first Kubernetes release where an initial version of the KEP was + available +- the version of Kubernetes where the KEP graduated to general + availability +- when the KEP was retired or superseded --> ## Drawbacks -With aggregation, the size of the aggregated discovery document could be an issue in the future since clients will need to download the entire document on any resource update. At the moment, even with 3000 CRDs (already very unlikely), the total size is still smaller than 1MB. +With aggregation, the size of the aggregated discovery document could +be an issue in the future since clients will need to download the +entire document on any resource update. At the moment, even with 3000 +CRDs (already very unlikely), the total size is still smaller than +1MB. ## Alternatives - + + +An alternative that was considered is [Discovery Cache +Busting](https://docs.google.com/document/d/1AulBtUYjVcc4s809YSQq4bdRdDO3byY7ew9za4Ortj4). +Cache-busting allows clients to know if the files need to be +downloaded at all, and in most cases the download can be skipped +entirely. This typically works by including a hash of the resource +content in its name, while marking the objects as never-expiring using +cache control headers. Clients can then recognize if the names have +changed or stayed the same, and re-use files that have kept the same +name without downloading them again. + +Aggregated Discovery was selected because of the amount of requests that are saved both on startup and on changes involving multiple group versions. For a full comparison between Discovery Cache Busting and Aggregated Discovery, please refer to the [Google Doc](https://docs.google.com/document/d/1sdf8nz5iTi86ErQy9OVxvQh_0RWfeU3Vyu0nlA10LNM).