UVIP : Add beta criteria #4441

richabanker · 2024-01-25T21:29:15Z

One-line PR description: Add beta criteria for UVIP
Issue link: Unknown Version Interoperability Proxy #4020

k8s-ci-robot · 2024-01-25T21:30:15Z

@richabanker: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-test	`cfb02c5`	link	true	`/test pull-enhancements-test`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

jpbetz · 2024-01-25T22:27:30Z

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

-    necessary that discovery exactly matches the capability of the system. So,
-    we will use the storage version objects to reconstruct a merged discovery
-    document and serve that in all apiservers.
+  - We will proxy the discovery request to the newest available apiserver in the cluster. In doing so, we will ensure that all discovery requests are handled keeping the future state of the cluster in mind. 


This seems like a good simplification. The complexity of merging across versions struck me as something that would complicate the openapi code more than I want, and make caching optimizations even harder.

Note also that the compatibility version KEP requires introducing the ability to produce a discovery document for a version different than the binary version. The implementation will require conditionally showing APIs in discovery based on compatibility version. This feels like it could be extended to achieve what we want to achieve by merging. I'd rather not have two approaches to do basically the same thing in our codebase, so I'm in favor on letting the compatibility version KEP progress and keeping this feature simple (and since this is targeting a 1.31 implementation, we have some time). If we decide later that we need to do more around discovery, we can leverage the compatibility version work.

Thanks for working through the below list of garbage collection and namespace controller use cases.

I can't tell if this is proposing always redirecting discovery to the newest server, or only when servers disagree about the contents of discovery. I commented on both the GC and namespace lifecycle sections, but pointing both of those controllers at the new server in an upgrade that is stopping serving an API lets both of them orphan things. I thought we were trying to fix that gap.

I was thinking the former, i.e. always redirecting discovery to the latest apiserver.
Thanks for clarifying my understanding of the GC and NLC behaviors. Yeah need to think more on how to handle the scenario where an API is not served by the latest apiserver.

So, thinking more on how/whether to handle discovery requests as a part of UVIP I am seeing the following options:
a). implement a merged discovery doc as was originally proposed
b). route discovery to newest apiserver
c). do nothing (drop discovery handling from UVIP)

Going by #4441 (comment), the compatibility version feature will make it possible to implement merging far more simply. With compatibility version we would be able to track historical API information, and if we use that to serve discovery, we also avoid the problem of GC and NLC becoming unaware of the previously-present-now-absent APIs (which happens if we route discovery to newest apiserver as a part of UVIP).

Maybe we just let discovery be as it is now and not address that for UVIP? Which means we avoid introducing significant changes that may not all make sense considering there will be future changes that compatibility versions will make to discovery. That would also avoid code complexity and developer time required to implement merging if we can get it mostly for free once compatibility version finishes.

TLDR: how about going with c). ?

Note also that the compatibility version KEP requires introducing the ability to produce a discovery document for a version different than the binary version

My initial assumption is this wouldn't require a code change. If compatibility version KEP is changing the storage version & enabling/disabling certain versions before API registration, then I'd expect that the discovery document should only reflect the installed/enabled APIs; but I'd need to review the logic again to say for sure

c). do nothing (drop discovery handling from UVIP)
TLDR: how about going with c). ?

I've seen controllers use the discovery information as the source of truth as to what GVRs are available. IMO it would be surprising to be able to send a request to an apiserver and have it proxied/served, but that resource not be listed in discovery when I query the same service. Would controllers that rely on complete discovery be required to be aware of UVIP?

So, once we make UVIP compatibility version aware, we will route client requests to only those apiservers that are compatible with the client. That is also one of the beta criteria added here. And the compatibility version KEP will provide discovery for a compatibility range. Agree that there will be some gap till when compatibility version aware discovery happens (UVIP should remain beta till then), but eventually we should be in a state where UVIP proxying and discovery are aligned.

jpbetz · 2024-01-25T23:33:59Z

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

+
+* Version Aware Proxying:
+
+  - With [compatibility versions](https://github.com/kubernetes/enhancements/pull/4395) information available for kubernetes components, we can leverage that to perform version aware proxying between API servers, and ensure that a request is proxied to an eligible apiserver given that it was generated by a client compatible with the apiserver/s present in a cluster. Version aware proxying lets us ensure that requests from clients (possibly) expecting to use a feature introduced in later K8s versions,  are not incorrectly routed to apiservers at older versions.


Do we need to describe how clients would declare their version to servers here?

Added a suggestion for specifying this information in client request headers. Also added a TODO section to flesh out the details of the same in Design Details.

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

k8s-ci-robot · 2024-01-25T23:51:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Richabanker
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-api-machinery/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liggitt · 2024-01-26T19:55:13Z

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

+This proposal will cause the GC to behave in the following way for the scenarios described:
+
+- Case 1: if the resource is being deleted in the newer version 
+    - Any actions taken on this resource during the upgrade will render useless after the upgrade when these objects are orphaned so we can act fast and proxy this request to the newer apiserver. Garbage collection will be triggered for a deleted resource, which is correct in this case


GC does not trigger for an absent resource type ... discovery routing to the new server would mean GC would think the type did not exist any more and would hang cleaning up children of those objects. I'm not sure that's good.

Oh my initial(incorrect) understanding was that the newest apiserver will respond with the list of available resources in the new state and GC would assume that the resources not found in this list are safe to be deleted.

But if the GC will only act on resources from the list returned by the newest apiserver, then yes the objects from the deleted API will remain hanging waiting for something to claim them. :/
I need to find more details on how garbage collection works for K8s objects. Will update here once I have more clarity. Thanks for the review though!

liggitt · 2024-01-26T19:55:37Z

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

+- Case 1: if the resource is being deleted in the newer version 
+    - Any actions taken on this resource during the upgrade will render useless after the upgrade when these objects are orphaned so we can act fast and proxy this request to the newer apiserver. Garbage collection will be triggered for a deleted resource, which is correct in this case
+- Case 2: if the resource is being introduced in the newer version
+    - If the resource is being introduced in the newer version - proxying the discovery request to the newer apiserver is anyway the right thing to do. Garbage collection will be blocked which is also correct


I'm not sure what "will be blocked" means... can you clarify that?

Ah I just meant that GC would see that the objects of the newly introduced API are available (as the discovery doc returned by the newest apiserver would reflect so). And if such objects are being referenced, it would not attempt to delete those.

liggitt · 2024-01-26T19:56:31Z

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

+This proposal will cause the NLC to behave in the following way for the scenarios described:
+
+- Case 1: if the resource is being deleted in the newer version
+  - if the resource is actually being deleted in the newer version, namespace deletion will be triggered - which makes sense for the eventual state of the cluster. 


Routing discovery to the new server means the namespace lifecycle controller will not be aware the resource type exists via discovery, and will orphan the resources rather than cleaning them up when handling a namespace deletion. That doesn't seem good.

Updated my understanding of the NLC's behavior. Though its unclear now how serving the latest discovery doc will solve this problem.

liggitt · 2024-01-26T19:57:39Z

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

+- Case 1: if the resource is being deleted in the newer version
+  - if the resource is actually being deleted in the newer version, namespace deletion will be triggered - which makes sense for the eventual state of the cluster. 
+- Case 2: If the resource is being introduced in the newer version
+  - namespace deletion is not triggered, as it should not - since these resources exist in the newer version of the cluster


namespace deletion is not triggered, as it should not

I don't understand this... routing discovery to the new server would make namespace controller aware of the new resource type, so when it is cleaning up any namespace, it will check for instances of that new type and delete them, like we want it to.

Updated. Thanks.

k8s-triage-robot · 2024-04-30T17:31:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-05-30T17:39:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-06-29T18:07:30Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-06-29T18:07:35Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jan 25, 2024

k8s-ci-robot requested review from apelisse and jpbetz January 25, 2024 21:29

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 25, 2024

richabanker marked this pull request as draft January 25, 2024 21:29

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 25, 2024

jpbetz reviewed Jan 25, 2024

View reviewed changes

UVIP : Add beta criteria

47aa388

richabanker force-pushed the mvp-beta branch from cfb02c5 to 47aa388 Compare January 25, 2024 23:50

liggitt reviewed Jan 26, 2024

View reviewed changes

Clarify GC's and NLC's behaviors

88507e1

richabanker force-pushed the mvp-beta branch from 687d2b1 to 88507e1 Compare January 26, 2024 23:07

jpbetz mentioned this pull request Jan 31, 2024

Add section about discovery to compatibility version KEP jpbetz/enhancements#13

Merged

alexzielenski mentioned this pull request Feb 20, 2024

Kubernetes upgrade - "api-resources" call fails kubernetes/kubernetes#113861

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 30, 2024

k8s-ci-robot closed this Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UVIP : Add beta criteria #4441

UVIP : Add beta criteria #4441

richabanker commented Jan 25, 2024

k8s-ci-robot commented Jan 25, 2024

jpbetz Jan 25, 2024 •

edited

Loading

liggitt Jan 26, 2024 •

edited

Loading

richabanker Jan 26, 2024

richabanker Jan 31, 2024

alexzielenski Jan 31, 2024 •

edited

Loading

richabanker Jan 31, 2024

jpbetz Jan 25, 2024

richabanker Jan 26, 2024

k8s-ci-robot commented Jan 25, 2024

liggitt Jan 26, 2024

richabanker Jan 26, 2024

liggitt Jan 26, 2024

richabanker Jan 26, 2024

liggitt Jan 26, 2024

richabanker Jan 26, 2024

liggitt Jan 26, 2024

richabanker Jan 26, 2024

k8s-triage-robot commented Apr 30, 2024

k8s-triage-robot commented May 30, 2024

k8s-triage-robot commented Jun 29, 2024

k8s-ci-robot commented Jun 29, 2024


		* Version Aware Proxying:

		- With [compatibility versions](https://github.com/kubernetes/enhancements/pull/4395) information available for kubernetes components, we can leverage that to perform version aware proxying between API servers, and ensure that a request is proxied to an eligible apiserver given that it was generated by a client compatible with the apiserver/s present in a cluster. Version aware proxying lets us ensure that requests from clients (possibly) expecting to use a feature introduced in later K8s versions, are not incorrectly routed to apiservers at older versions.

UVIP : Add beta criteria #4441

UVIP : Add beta criteria #4441

Conversation

richabanker commented Jan 25, 2024

k8s-ci-robot commented Jan 25, 2024

jpbetz Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

liggitt Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexzielenski Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-triage-robot commented Apr 30, 2024

k8s-triage-robot commented May 30, 2024

k8s-triage-robot commented Jun 29, 2024

k8s-ci-robot commented Jun 29, 2024

jpbetz Jan 25, 2024 •

edited

Loading

liggitt Jan 26, 2024 •

edited

Loading

alexzielenski Jan 31, 2024 •

edited

Loading