diff --git a/keps/prod-readiness/sig-api-machinery/3352.yaml b/keps/prod-readiness/sig-api-machinery/3352.yaml index caffcbade463..6b204c391767 100644 --- a/keps/prod-readiness/sig-api-machinery/3352.yaml +++ b/keps/prod-readiness/sig-api-machinery/3352.yaml @@ -1,3 +1,5 @@ kep-number: 3352 alpha: approver: "@deads2k" +beta: + approver: "@deads2k" diff --git a/keps/sig-api-machinery/3352-aggregated-discovery/README.md b/keps/sig-api-machinery/3352-aggregated-discovery/README.md index b7712b3a2c83..7d671ac388e9 100644 --- a/keps/sig-api-machinery/3352-aggregated-discovery/README.md +++ b/keps/sig-api-machinery/3352-aggregated-discovery/README.md @@ -616,6 +616,7 @@ main focus will be on kubectl and golang clients. #### Beta - kubectl uses the aggregated discovery feature by default +- Metrics are added #### GA @@ -678,7 +679,12 @@ channel if you need any help or guidance. --> ###### Does enabling the feature change any default behavior? -No +Clients using client-go version 1.25 and up will use the aggregated +discovery endpoint rather than the unaggregated discovery endpoint. +This is handled automatically in client-go and clients should see less +requests to the api server when fetching discovery information. Client +versions older than 1.25 will continue to use the old unaggregated +discovery endpoint without any changes. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? @@ -690,7 +696,13 @@ and restarting the component. No other changes should be necessary to disable the feature. NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. ---> Yes, the feature may be disabled by reverting the feature flag. +--> + +Yes, the feature may be disabled on the apiserver by reverting the +feature flag. The feature may also be turned off client side by users +of client-go via the +[WithLegacy()](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/discovery/discovery_client.go#L80) +toggle. ###### What happens if we reenable the feature if it was previously rolled back? @@ -731,6 +743,18 @@ feature flags will be enabled on some API servers and not others during the rollout. Similarly, consider large clusters and how enablement/disablement will rollout across nodes. --> +During a rollout, some apiservers may support aggregated discovery and +some may not. It is recommended that clients request for both the +aggregated discovery document with a fallback to the unaggregated +discovery format. This can be achieved by setting the Accept header to +have a fallback to the default GVK of the `/apis` and `/api` endpoint. +For example, to request the aggregated discovery type and fallback to +the unaggregated discovery, the following header can be sent: `Accept: +as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json` + +This kind of fallback is already implemented in client-go and this +note is intended for non-golang clients. + ###### What specific metrics should inform a rollback? +n/a. + ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? +By enabling aggregated discovery as the default, the new API is +slightly different from the unaggregated version. The +StorageVersionHash field is removed from resources in the aggregated +discovery API. The storage version migrator will have an additional +flag when initializing the discovery client to continue using the +unaggregated API. + ### Monitoring Requirements Kubernetes API (e.g., checking if there are objects with field X set) may be a last resort. Avoid logs or events for this purpose. --> +Operators can check whether an aggregated discovery request can be +made by sending a request to `apis` with +`application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json` +as the Accept header and looking at the the `Content-Type` response +header. A Content Type response header of `Content-Type: +application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList` +indicates that aggregated discovery is supported and a `Content-Type: +application/json` header indicates that aggregated discovery is not +supported. They can also check for the presence of aggregated +discovery related metrics: `aggregated_discovery_aggregation_count` + ###### How can someone using this feature know that it is working for their instance? - [x] Metrics - Metric name: `aggregator_discovery_aggregation_duration` - Components exposing the metric: `kube-server` - - This is a metric for exposing the time it took to aggregate all the + - This is a metric for exposing the time it took to aggregate all the api resources. + + - Metric name: `aggregator_discovery_aggregation_count` + - Components exposing the metric: `kube-server` + - This is a metric for the number of times that the discovery document has been aggregated. ###### Are there any missing metrics that would be useful to have to improve observability of this feature? -Yes. A metric for the regeneration count of the discovery document. `aggregator_discovery_aggregation_count` +No. ### Dependencies @@ -833,6 +881,12 @@ cluster-level services (e.g. DNS): - Impact of its degraded performance or high-error rates on the feature: --> +No, but if aggregated apiservers are present, the feature will attempt +to contact and aggregate the data published from the aggregated +apiserver on a set interval. If there is high error rate, stale data +may be returned because the latest data was not able to be fetched +from the aggregated apiserver. + ### Scalability +No. Enabling this feature should reduce the total number of API calls +for client discovery. Instead of clients sending a discovery request +to all group versions (`/apis//`), they will only need +to send a request to the aggregated endpoint to obtain all resources +that the cluster supports. + ###### Will enabling / using this feature result in introducing new API types? +Yes, but these API types are not persisted. + ###### Will enabling / using this feature result in any new calls to the cloud provider? +No. + ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - Estimated amount of new objects: (e.g., new Object X for every existing Pod) --> +No. + ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? +No. + ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? +No. + ### Troubleshooting ###### What steps should be taken if SLOs are not being met to determine the problem? +The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false. + ## Implementation History