-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virt-handler: Restrict write access to nodes #246
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Overview | ||
|
||
A reported [Advisory](https://github.com/kubevirt/kubevirt/security/advisories/GHSA-cp96-jpmq-xrr2) is outlining how of our virt-handler component can be abused to escalate local privileges when a node, running virt-handler, is compromised. A flow and more details can be found on reported [issue](https://github.com/kubevirt/kubevirt/issues/9109). | ||
|
||
This proposal is outlining mitigation in a form of internal change. | ||
|
||
## Motivation | ||
|
||
Kubevirt should be secure by default. While mitigation is available, it is not part of Kubevirt. | ||
|
||
## Goals | ||
|
||
Mitigate the advisory by default. | ||
|
||
## Non Goals | ||
|
||
## Definition of Users | ||
|
||
Not user-facing change | ||
|
||
## User Stories | ||
|
||
Not user-facing change | ||
|
||
## Repos | ||
|
||
Kubevirt/Kubevirt | ||
|
||
# Design | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One alternative design that was proposed the sig-scale call, that may solve the same problem is as follows:
This could be a generic solution that could help all daemonsets that are deployed with privileged RBAC and could be a separate project all together. Href: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that this is addressing the probelm fully. There are two thing which need to be fixed:
(2) still allows me to easily lure in privileged CP components and then just steal their credentials. We could definitely create unique roles for each node, and in addition deploy a node webhook which would ensure that a service account only, that is also what the apiserver does internally for each kubelet, but that has the severe disadvantage that we would now need a required webhook on a core k8s resource. If our node webhook is ever unavailable, no kubelet, or anythig else can actually continue to modify the node. That's a no-go. THere are other positive side-effects if we fix it with the "shadow nodes":
One may now think that now virt-controller still can modify all nodes, and that is true. However now we only have one component which can do that instead of n, further a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. alaypatel07 thanks for bringing this up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rmohr we have to address a third element
IOW 3. Restrict the damemonset to only modify KubeVirt owned CRs related to it's own node There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes that third element is true, but I would qualify it as a follow-up step, not needed to address the CVE. Now virt-handler cannot blindly modify the node topology anymore. It can only modify hardware capability availability. It can not lure in privileged or sensitive workloads anymore by just applying labels which they are waiting for, or by making a node a control-plane node. The follow up would be either cert based auth for virt-handler, or maybe we can still use tokens and get the IP in the webhooks for verification. Something like this ... Let's add it to the doc as follow-up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fabiand AFAIK the only kubevirt CR virt-handler can update is @rmohr I will add a
Is that alright with you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RamLavi yes - if this is the only one, then yes. And: generally speaking: handler should just be able to touch anything which a) it should be permitted to touch and b) only if it's own it's "own node" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ACK. DONE. |
||
(This should be brief and concise. We want just enough to get the point across) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me a high level design is missing here. It could be something like: The security issue is being addressed by introducing ShadowNodes for every Kubernetes node. virt-handlers are only interacting with ShadowNodes - which allows us to drop any node access privieleges from the handler's SA - a controller will then map these ShadowNode requests to the real kubernetes nodes, concentrating the required permissions on the controller. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks! DONE. |
||
|
||
In order to easy reviews and understand what needs to be change the first section will describe different usage of Node object in the virt-handler. | ||
|
||
## Node usage | ||
1. [markNodeAsUnschedulable](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/cmd/virt-handler/virt-handler.go#L185) is used at the start and the termination (by SIGTERM only) of virt-handler to indicate (best-effort) that the node is not ready to handle VMs. | ||
|
||
2. Parts of Heartbeat | ||
1. [labelNodeUnschedulable](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L96) is similarly as markNodeAsUnschedulable used only when virt-handler is stopping. | ||
|
||
2. As part of [do](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L119) we do 2 calls: | ||
[Get](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L139) - irrelevant as this is read access. | ||
[Patch](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L139) - updating NodeSchedulable, CPUManager, KSMEnabledLabel labels and VirtHandlerHeartbeat KSMHandlerManagedAnnotation annotations once per minute. | ||
|
||
3. As part of node labeller | ||
1. [run](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/node-labeller/node_labeller.go#L195) - irrelevant as this is read access | ||
|
||
2. [patchNode](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/node-labeller/node_labeller.go#L254) - node labeller patching node with labels for various informations which are mostly static in nature: cpu model, hyperv features, cpu timer, sev/-es, RT... | ||
This is triggered both by interval (3 min) and on changes of cluster config. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if the consumers for each of the above node labels and annotations should be mentioned alongside? For example, the heartbeat in point 2. is consumed by the NodeController in virt-controller-app to check if node is healthy and raise an event. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think adding the consumers will not necessarily add clarity here. I did rephrase to make it more high level. |
||
|
||
## Shadow node | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fabiand I was not present during the time discussions on how other projects solved the issue. For obvious reasons, this is not something a project will advertise, so perhaps @xpivarc and @rmohr could shed more light on this. However, I did come across what I think is cilium's fix on the matter, and it looks like they went by the CRD approach, same as us. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW I'd expect any author of such a feature to do some field research. Thus, yes, maybe you joined later, but that should not stop you from doing your own research :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @xpivarc iirc then this idea was taken to kube sig api- has it? what was the outcome of their discussion? |
||
|
||
The proposed solution is to introduce new CRD that will be shadowing a node "ShadowNode". The virt-handler will be writting to ShadowNode and virt-controller will have a responsibility to sync allowed information to Node. | ||
|
||
## API Examples | ||
```yaml | ||
apiVersion: <apiversion> | ||
kind: ShadowNode | ||
metadata: | ||
name: <Equivalent to existing name> | ||
// Labels and Annotations only | ||
spec: {} //Empty, allows futher extension | ||
status: {} //Empty, allows futher extension | ||
``` | ||
|
||
## Scalability | ||
|
||
There are few aspects to consider. | ||
|
||
1. #shadowNodes will be equivalent #nodes, negligible space overhead | ||
|
||
2. #writes could double. Here it is importent to look at the usage and sort each case to 2 categories (low and high frequent write). The first usage is clearly low frequent as long as virt-handler operates as designed. | ||
The second usage consist of two cases which might seem different but in fact these are same most of the time because NodeSchedulable, CPUManager, KSMEnabledLabel labels and KSMHandlerManagedAnnotation annotation is mostly static. What is left is VirtHandlerHeartbeat that is not necessary to sync on a Node (requires re-working node controller). | ||
The third case is propagating labels that are also mostly static. | ||
With all the above doubling #writes is unlikely. | ||
|
||
|
||
## Update/Rollback Compatibility | ||
|
||
Virt-handler is going to continue writting to Node object, therefore update should be without compatibility issues. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When upgrade rolls out, virt-controllers are upgraded first then the daemonsets followed by virt-api. There will be a timeperiod where new virt-controllers (with ShadowNode code) will work with old virt-handlers and old-apiserver without ShadowNode API. Can we mention here clear steps how:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @alaypatel07, I'll be taking over addressing the issue this proposal is trying to mitigate. I plan to take the same course this proposal is choosing. I will answer your question in the next reply, but if you think I should re-admit the proposal with the answers addressed in them - please tell me and I will do so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In regard to your question, I think the upgrade order is a bit different. Please keep me honest, but according to this - virt-operator upgrade order is like this:
Also note that:
Having said the above, this is a specific mention to each upgrade scenario - before and after the upgrade itself: During upgrade rollout:
After upgrade rollout: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My suggestion, if possible, would be the following:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There may be some neglectible overlap between virt-handlers picking up the config change and the rbac removal then, but that should worst case just result in a few rbac permission errors which we should then not mask. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rmohr, I respectfully disagree. Using SelfSubjectAccessReview isn't just masking errors; it's a direct and intentional way of asking the RBAC server, "Can I patch the node?" rather than just hoping things are right. It cheaper than patching the actual node then failing, and ensures actions are only taken when permitted, adding a layer of security. I believe relying on the upgrade mechanism to signal a status field once ready could prove to be more costly in the long run, considering the potential corner scenarios mentioned earlier. How about if I stop checking with the RBAC server after the first time I see that the node patch is forbidden (i.e. SelfSubjectAccessReview returns Allowed=false). --> This way each virt-handler will stop re-sending SelfSubjectAccessReview requests soon after node-patching is disabled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not sure. I tried to express a knowledge gap which we try to overcome with checking for forbidden error or the SelfSubjectAccessReview. In both cases we don't know if it is intentional and correct that we don't have the permission. With a clear config switch we know: Now I should do both, and from now on I should do one. Having a very brief period regarding to eventual consistency is fine where maybe one or two update requests would fail (like a low amount of conflict REST calls is fine too). The information gap is what I consider "masking". Therefore the reviews add up quite a big front of additional uncertainties without making anything better (actually imo much more complex).
I am not sure what "cheap" means here, and giving additional permissions which may leak info which we did not have access to before opens security questions which I don't even want to have to think about. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
ah you and I interpret "masking errors" differently. I get what you mean now. "masking errors" in my book is just the fact that we are ignoring a specific error, which is in general no good practice.
I take back the "cheap" part - we are not sending the entire node object on the SET action, so the difference should be negligible. @rmohr I am still against using the upgrade as a sign to know if we can stop patching the node. The upgrade process is complex and state-full, and relying on it may introduce corner cases. In a way, Lubo's "try and ignore if fails on specific error" is much more stateless and clean. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alaypatel07 I updated the upgrade scenarios on the proposal's body as you suggested. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It should basically be a check like this:
where |
||
|
||
## Functional Testing Approach | ||
|
||
Existing functional tests should be enough to cover this change. | ||
|
||
# Implementation Phases | ||
(How/if this design will get broken up into multiple phases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please expand this to add explicit goals? I think @rmohr summarized the goals clearly in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 please expand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE