-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UVIP : Add beta criteria #4441
UVIP : Add beta criteria #4441
Conversation
@richabanker: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
necessary that discovery exactly matches the capability of the system. So, | ||
we will use the storage version objects to reconstruct a merged discovery | ||
document and serve that in all apiservers. | ||
- We will proxy the discovery request to the newest available apiserver in the cluster. In doing so, we will ensure that all discovery requests are handled keeping the future state of the cluster in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good simplification. The complexity of merging across versions struck me as something that would complicate the openapi code more than I want, and make caching optimizations even harder.
Note also that the compatibility version KEP requires introducing the ability to produce a discovery document for a version different than the binary version. The implementation will require conditionally showing APIs in discovery based on compatibility version. This feels like it could be extended to achieve what we want to achieve by merging. I'd rather not have two approaches to do basically the same thing in our codebase, so I'm in favor on letting the compatibility version KEP progress and keeping this feature simple (and since this is targeting a 1.31 implementation, we have some time). If we decide later that we need to do more around discovery, we can leverage the compatibility version work.
Thanks for working through the below list of garbage collection and namespace controller use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell if this is proposing always redirecting discovery to the newest server, or only when servers disagree about the contents of discovery. I commented on both the GC and namespace lifecycle sections, but pointing both of those controllers at the new server in an upgrade that is stopping serving an API lets both of them orphan things. I thought we were trying to fix that gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the former, i.e. always redirecting discovery to the latest apiserver.
Thanks for clarifying my understanding of the GC and NLC behaviors. Yeah need to think more on how to handle the scenario where an API is not served by the latest apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, thinking more on how/whether to handle discovery requests as a part of UVIP I am seeing the following options:
a). implement a merged discovery doc as was originally proposed
b). route discovery to newest apiserver
c). do nothing (drop discovery handling from UVIP)
Going by #4441 (comment), the compatibility version feature will make it possible to implement merging far more simply. With compatibility version we would be able to track historical API information, and if we use that to serve discovery, we also avoid the problem of GC and NLC becoming unaware of the previously-present-now-absent APIs (which happens if we route discovery to newest apiserver as a part of UVIP).
Maybe we just let discovery be as it is now and not address that for UVIP? Which means we avoid introducing significant changes that may not all make sense considering there will be future changes that compatibility versions will make to discovery. That would also avoid code complexity and developer time required to implement merging if we can get it mostly for free once compatibility version finishes.
TLDR: how about going with c). ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note also that the compatibility version KEP requires introducing the ability to produce a discovery document for a version different than the binary version
My initial assumption is this wouldn't require a code change. If compatibility version KEP is changing the storage version & enabling/disabling certain versions before API registration, then I'd expect that the discovery document should only reflect the installed/enabled APIs; but I'd need to review the logic again to say for sure
c). do nothing (drop discovery handling from UVIP)
TLDR: how about going with c). ?
I've seen controllers use the discovery information as the source of truth as to what GVRs are available. IMO it would be surprising to be able to send a request to an apiserver and have it proxied/served, but that resource not be listed in discovery when I query the same service. Would controllers that rely on complete discovery be required to be aware of UVIP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, once we make UVIP compatibility version aware, we will route client requests to only those apiservers that are compatible with the client. That is also one of the beta criteria added here. And the compatibility version KEP will provide discovery for a compatibility range. Agree that there will be some gap till when compatibility version aware discovery happens (UVIP should remain beta till then), but eventually we should be in a state where UVIP proxying and discovery are aligned.
|
||
* Version Aware Proxying: | ||
|
||
- With [compatibility versions](https://github.com/kubernetes/enhancements/pull/4395) information available for kubernetes components, we can leverage that to perform version aware proxying between API servers, and ensure that a request is proxied to an eligible apiserver given that it was generated by a client compatible with the apiserver/s present in a cluster. Version aware proxying lets us ensure that requests from clients (possibly) expecting to use a feature introduced in later K8s versions, are not incorrectly routed to apiservers at older versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to describe how clients would declare their version to servers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion for specifying this information in client request headers. Also added a TODO section to flesh out the details of the same in Design Details.
keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md
Outdated
Show resolved
Hide resolved
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Richabanker The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This proposal will cause the GC to behave in the following way for the scenarios described: | ||
|
||
- Case 1: if the resource is being deleted in the newer version | ||
- Any actions taken on this resource during the upgrade will render useless after the upgrade when these objects are orphaned so we can act fast and proxy this request to the newer apiserver. Garbage collection will be triggered for a deleted resource, which is correct in this case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GC does not trigger for an absent resource type ... discovery routing to the new server would mean GC would think the type did not exist any more and would hang cleaning up children of those objects. I'm not sure that's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my initial(incorrect) understanding was that the newest apiserver will respond with the list of available resources in the new state and GC would assume that the resources not found in this list are safe to be deleted.
But if the GC will only act on resources from the list returned by the newest apiserver, then yes the objects from the deleted API will remain hanging waiting for something to claim them. :/
I need to find more details on how garbage collection works for K8s objects. Will update here once I have more clarity. Thanks for the review though!
- Case 1: if the resource is being deleted in the newer version | ||
- Any actions taken on this resource during the upgrade will render useless after the upgrade when these objects are orphaned so we can act fast and proxy this request to the newer apiserver. Garbage collection will be triggered for a deleted resource, which is correct in this case | ||
- Case 2: if the resource is being introduced in the newer version | ||
- If the resource is being introduced in the newer version - proxying the discovery request to the newer apiserver is anyway the right thing to do. Garbage collection will be blocked which is also correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "will be blocked" means... can you clarify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I just meant that GC would see that the objects of the newly introduced API are available (as the discovery doc returned by the newest apiserver would reflect so). And if such objects are being referenced, it would not attempt to delete those.
This proposal will cause the NLC to behave in the following way for the scenarios described: | ||
|
||
- Case 1: if the resource is being deleted in the newer version | ||
- if the resource is actually being deleted in the newer version, namespace deletion will be triggered - which makes sense for the eventual state of the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Routing discovery to the new server means the namespace lifecycle controller will not be aware the resource type exists via discovery, and will orphan the resources rather than cleaning them up when handling a namespace deletion. That doesn't seem good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated my understanding of the NLC's behavior. Though its unclear now how serving the latest discovery doc will solve this problem.
- Case 1: if the resource is being deleted in the newer version | ||
- if the resource is actually being deleted in the newer version, namespace deletion will be triggered - which makes sense for the eventual state of the cluster. | ||
- Case 2: If the resource is being introduced in the newer version | ||
- namespace deletion is not triggered, as it should not - since these resources exist in the newer version of the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace deletion is not triggered, as it should not
I don't understand this... routing discovery to the new server would make namespace controller aware of the new resource type, so when it is cleaning up any namespace, it will check for instances of that new type and delete them, like we want it to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
One-line PR description: Add beta criteria for UVIP
Issue link: Unknown Version Interoperability Proxy #4020