Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt-handler: Restrict write access to nodes #246

Merged
merged 2 commits into from
May 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions design-proposals/shadow-node.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Overview

A reported [Advisory](https://github.com/kubevirt/kubevirt/security/advisories/GHSA-cp96-jpmq-xrr2) is outlining how of our virt-handler component can be abused to escalate local privileges when a node, running virt-handler, is compromised. A flow and more details can be found on reported [issue](https://github.com/kubevirt/kubevirt/issues/9109).

This proposal is outlining mitigation in a form of internal change.

## Motivation

Kubevirt should be secure by default. While mitigation is available, it is not part of Kubevirt.

## Goals

Mitigate the advisory by default.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please expand this to add explicit goals? I think @rmohr summarized the goals clearly in the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 please expand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE


## Non Goals

## Definition of Users

Not user-facing change

## User Stories

Not user-facing change

## Repos

Kubevirt/Kubevirt

# Design

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative design that was proposed the sig-scale call, that may solve the same problem is as follows:

  1. Introduce a new controller, lets call it token-distributor-controller.
  2. The token-distributor-controller has a RBAC Template CR that creates a unique ClusterRole, ClusterRolebinding and ServiceAccount for each virt-handler pod. Example of the template from kubevirt-handler clusterrole
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  resourceNames:
  - <node-name>
  verbs:
  - patch
  1. The token-distributor-controller will then create a token for the pod via TokenRequest API [1][2] pod on the node and deliver the token on the node.
  2. The virt-handler pod will use a local volume mount to receive the token. An init container could be used to make sure the file is delivered by token-distributor-controller
  3. Virt-handler will start with restricted permissions that could be carefully modified so that it will work on all the objects that are related to the node

This could be a generic solution that could help all daemonsets that are deployed with privileged RBAC and could be a separate project all together.

Href:

  1. https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#manually-create-an-api-token-for-a-serviceaccount
  2. https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-request-v1/

Copy link
Member

@rmohr rmohr Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is addressing the probelm fully. There are two thing which need to be fixed:

  1. being able to modify other nodes (that can be addressed like above)
  2. Not being able to modify non-kubevirt-owned fields on the node (this includes not being able to change the spec or any label/annotation which is not stricty kubevirt owned)

(2) still allows me to easily lure in privileged CP components and then just steal their credentials.

We could definitely create unique roles for each node, and in addition deploy a node webhook which would ensure that a service account only, that is also what the apiserver does internally for each kubelet, but that has the severe disadvantage that we would now need a required webhook on a core k8s resource. If our node webhook is ever unavailable, no kubelet, or anythig else can actually continue to modify the node. That's a no-go.

THere are other positive side-effects if we fix it with the "shadow nodes":

  • heartbeat can be moved to the shadow node, meaning we create much less load on all the controllers watching nodes.
  • we finally have a clear api to express our node-needs and don't need to press our API in annotations and labels on nodes.

One may now think that now virt-controller still can modify all nodes, and that is true. However now we only have one component which can do that instead of n, further a token-distributor-controller controller which would create those rbac roles has to have the exact same permissions as well, otherwise it would not be able to create the roles in the first place ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr thanks for summarizing the reasons.

@xpivarc if it is helpful, please add this to alternative designs considered for future us to refer back at.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we includes this in the gola section?

@RamLavi @xpivarc ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alaypatel07 thanks for bringing this up.
Added the alternative to the proposal body.
Added to the goals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr we have to address a third element

  1. Prevent that the daemonset can modify other KubeVirt owned CRs related to other nodes

IOW 3. Restrict the damemonset to only modify KubeVirt owned CRs related to it's own node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that third element is true, but I would qualify it as a follow-up step, not needed to address the CVE. Now virt-handler cannot blindly modify the node topology anymore. It can only modify hardware capability availability. It can not lure in privileged or sensitive workloads anymore by just applying labels which they are waiting for, or by making a node a control-plane node.

The follow up would be either cert based auth for virt-handler, or maybe we can still use tokens and get the IP in the webhooks for verification. Something like this ...

Let's add it to the doc as follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand AFAIK the only kubevirt CR virt-handler can update is virtualmachineinstances. Is this what you are referring to?

@rmohr I will add a Follow-up Steps section in the end of the doc and include:

  • (the one just suggested) Restrict the virt-handler daemonset to only modify KubeVirt owned CRs related to it's own node
  • Restrict the virt-handler daemonse from being able to update/patch other nodes

Is that alright with you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamLavi yes - if this is the only one, then yes. And: generally speaking: handler should just be able to touch anything which a) it should be permitted to touch and b) only if it's own it's "own node"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. DONE.

(This should be brief and concise. We want just enough to get the point across)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me a high level design is missing here.

It could be something like: The security issue is being addressed by introducing ShadowNodes for every Kubernetes node. virt-handlers are only interacting with ShadowNodes - which allows us to drop any node access privieleges from the handler's SA - a controller will then map these ShadowNode requests to the real kubernetes nodes, concentrating the required permissions on the controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! DONE.


In order to easy reviews and understand what needs to be change the first section will describe different usage of Node object in the virt-handler.

## Node usage
1. [markNodeAsUnschedulable](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/cmd/virt-handler/virt-handler.go#L185) is used at the start and the termination (by SIGTERM only) of virt-handler to indicate (best-effort) that the node is not ready to handle VMs.

2. Parts of Heartbeat
1. [labelNodeUnschedulable](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L96) is similarly as markNodeAsUnschedulable used only when virt-handler is stopping.

2. As part of [do](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L119) we do 2 calls:
[Get](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L139) - irrelevant as this is read access.
[Patch](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/heartbeat/heartbeat.go#L139) - updating NodeSchedulable, CPUManager, KSMEnabledLabel labels and VirtHandlerHeartbeat KSMHandlerManagedAnnotation annotations once per minute.

3. As part of node labeller
1. [run](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/node-labeller/node_labeller.go#L195) - irrelevant as this is read access

2. [patchNode](https://github.com/kubevirt/kubevirt/blob/5b61a18f4c3b6c549d50fd201664c21720e3dbe2/pkg/virt-handler/node-labeller/node_labeller.go#L254) - node labeller patching node with labels for various informations which are mostly static in nature: cpu model, hyperv features, cpu timer, sev/-es, RT...
This is triggered both by interval (3 min) and on changes of cluster config.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the consumers for each of the above node labels and annotations should be mentioned alongside?

For example, the heartbeat in point 2. is consumed by the NodeController in virt-controller-app to check if node is healthy and raise an event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the consumers will not necessarily add clarity here. I did rephrase to make it more high level.


## Shadow node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr @xpivarc @RamLavi do we know how other project shave solved this problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand I was not present during the time discussions on how other projects solved the issue. For obvious reasons, this is not something a project will advertise, so perhaps @xpivarc and @rmohr could shed more light on this. However, I did come across what I think is cilium's fix on the matter, and it looks like they went by the CRD approach, same as us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd expect any author of such a feature to do some field research. Thus, yes, maybe you joined later, but that should not stop you from doing your own research :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xpivarc iirc then this idea was taken to kube sig api- has it? what was the outcome of their discussion?


The proposed solution is to introduce new CRD that will be shadowing a node "ShadowNode". The virt-handler will be writting to ShadowNode and virt-controller will have a responsibility to sync allowed information to Node.

## API Examples
```yaml
apiVersion: <apiversion>
kind: ShadowNode
metadata:
name: <Equivalent to existing name>
// Labels and Annotations only
spec: {} //Empty, allows futher extension
status: {} //Empty, allows futher extension
```

## Scalability

There are few aspects to consider.

1. #shadowNodes will be equivalent #nodes, negligible space overhead

2. #writes could double. Here it is importent to look at the usage and sort each case to 2 categories (low and high frequent write). The first usage is clearly low frequent as long as virt-handler operates as designed.
The second usage consist of two cases which might seem different but in fact these are same most of the time because NodeSchedulable, CPUManager, KSMEnabledLabel labels and KSMHandlerManagedAnnotation annotation is mostly static. What is left is VirtHandlerHeartbeat that is not necessary to sync on a Node (requires re-working node controller).
The third case is propagating labels that are also mostly static.
With all the above doubling #writes is unlikely.


## Update/Rollback Compatibility

Virt-handler is going to continue writting to Node object, therefore update should be without compatibility issues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When upgrade rolls out, virt-controllers are upgraded first then the daemonsets followed by virt-api. There will be a timeperiod where new virt-controllers (with ShadowNode code) will work with old virt-handlers and old-apiserver without ShadowNode API. Can we mention here clear steps how:

  1. new virt-controllers will work with older virt-handlers and olde virt-api
  2. new virt-controllers and newer virt-handlers will work with older virt-api

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alaypatel07, I'll be taking over addressing the issue this proposal is trying to mitigate. I plan to take the same course this proposal is choosing. I will answer your question in the next reply, but if you think I should re-admit the proposal with the answers addressed in them - please tell me and I will do so.

Copy link
Contributor

@RamLavi RamLavi Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regard to your question, I think the upgrade order is a bit different. Please keep me honest, but according to this - virt-operator upgrade order is like this:

CRDs --> RBACs --> Daemonsets (=virt-handler) --> Controller Deployments (=virt-controller)--> API Deployments (=virt-API)

Also note that:

  • As mentioned above, the virt-handler will still keep trying to patch the node alongside the shadowNode, and at some point during the upgrade, it would start being blocked due to the removal of the patch ability. In order to mitigate this, the node patch will start ignoring RBAC "forbidden" error.
  • Normally, new CRDs are blocked until the upgrade rollout is over. The shadowNode CRD is going to be excluded in order to avoid upgrade issues that could cause hearbeat timeout.

Having said the above, this is a specific mention to each upgrade scenario - before and after the upgrade itself:

During upgrade rollout:

  • New CRD; old RBAC; old virt-handler; old virt-controller; old virt-API:
    • Old RBAC: no upgrade issue
    • Old virt-handler: no upgrade issue
    • Old virt-controllers: no upgrade issue
    • Old Virt-API: no upgrade issue
  • New CRD, new RBAC; old virt-handler; old virt-controller; old virt-API:
    • Old virt-handler: during upgrade, the old RBACs are kept as backup until upgrade is done. Hence, patching node should not fail so there is no upgrade issue.
    • Old virt-controllers: no upgrade issue, virt controller RBAC is not changed due to this solution.
    • Old Virt-API: no upgrade issue.
  • New CRD; new RBAC; new virt-handler; old virt-controller; old virt-API:
    • new virt-handler: shadowNode is written to but it is not synced to the node yet. However, virt-handler keeps patching the node directly, so there are no upgrade issue.
    • [same as before] Old controllers: no upgrade issue, virt controller RBAC is not changed.
      [same as before] Old Virt-API: no upgrade issue.
  • New CRD; new RBAC; new virt-handler; new virt-controller; old virt-API:
    • New virt-handler: no upgrade issue. virt-handler patches both node and shadow node.
    • New controllers: will start getting shadowNode requests once they are created. no upgrade issue.
    • [same as before] Old Virt-API: no upgrade issue.

After upgrade rollout:
virt-handler will keep trying (and failing) to patch the node since the backup RBACs are now removed. Virt-handler will ignore these cases by ignoring RBAC "forbidden" errors. This behavior should be kept until we no longer support upgrades from the current release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion, if possible, would be the following:

  • Introduce a new status field in the KubeVirt CR
  • Let virt-handlers continue patching the node as well
  • After virt-handlers and virt-controller are rolled over, set in the kubevirt CR a status field indicating that shadow nodes are now the way to go
  • virt-handler would pick it up right away over our config logic
  • finally remove the RBAC permissions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some neglectible overlap between virt-handlers picking up the config change and the rbac removal then, but that should worst case just result in a few rbac permission errors which we should then not mask.

Copy link
Contributor

@RamLavi RamLavi Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr, I respectfully disagree. Using SelfSubjectAccessReview isn't just masking errors; it's a direct and intentional way of asking the RBAC server, "Can I patch the node?" rather than just hoping things are right. It cheaper than patching the actual node then failing, and ensures actions are only taken when permitted, adding a layer of security.

I believe relying on the upgrade mechanism to signal a status field once ready could prove to be more costly in the long run, considering the potential corner scenarios mentioned earlier.

How about if I stop checking with the RBAC server after the first time I see that the node patch is forbidden (i.e. SelfSubjectAccessReview returns Allowed=false). --> This way each virt-handler will stop re-sending SelfSubjectAccessReview requests soon after node-patching is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr, I respectfully disagree. Using SelfSubjectAccessReview isn't just masking errors; it's a direct and intentional way of asking the RBAC server, "Can I patch the node?" rather than just hoping things are right.

I am not sure. I tried to express a knowledge gap which we try to overcome with checking for forbidden error or the SelfSubjectAccessReview. In both cases we don't know if it is intentional and correct that we don't have the permission. With a clear config switch we know: Now I should do both, and from now on I should do one. Having a very brief period regarding to eventual consistency is fine where maybe one or two update requests would fail (like a low amount of conflict REST calls is fine too).

The information gap is what I consider "masking". Therefore the reviews add up quite a big front of additional uncertainties without making anything better (actually imo much more complex).

It cheaper than patching the actual node then failing, and ensures actions are only taken when permitted, adding a layer of security.

I am not sure what "cheap" means here, and giving additional permissions which may leak info which we did not have access to before opens security questions which I don't even want to have to think about.

Copy link
Contributor

@RamLavi RamLavi Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information gap is what I consider "masking". Therefore the reviews add up quite a big front of additional uncertainties without making anything better (actually imo much more complex).

ah you and I interpret "masking errors" differently. I get what you mean now. "masking errors" in my book is just the fact that we are ignoring a specific error, which is in general no good practice.

I am not sure what "cheap" means here, and giving additional permissions which may leak info which we did not have access to before opens security questions which I don't even want to have to think about.

I take back the "cheap" part - we are not sending the entire node object on the SET action, so the difference should be negligible.
I get your concern in adding a new permission that may leak info in the future.

@rmohr I am still against using the upgrade as a sign to know if we can stop patching the node. The upgrade process is complex and state-full, and relying on it may introduce corner cases. In a way, Lubo's "try and ignore if fails on specific error" is much more stateless and clean.
Can we agree to continue with his approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alaypatel07 I updated the upgrade scenarios on the proposal's body as you suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr I am still against using the upgrade as a sign to know if we can stop patching the node. The upgrade process is complex and state-full, and relying on it may introduce corner cases. In a way, Lubo's "try and ignore if fails on specific error" is much more stateless and clean.
Can we agree to continue with his approach?

It should basically be a check like this:

if freshInstall && supportsShadowNodes()  {
  kubevirtCR.Status.UseShadowNode = True
}

if fully installed && !switchedToShadowNode && supportsShadowNodes() {
  kubevirtCR.Status.UseShadowNode = True
} else {
  kubevirtCR.Status.UseShadowNode = False
}

where supportsShadowNodes() can basically just check if the install strategy of the current version includes the shadow node CR, to ensure that the version we are upgrading downgrading to, has them.


## Functional Testing Approach

Existing functional tests should be enough to cover this change.

# Implementation Phases
(How/if this design will get broken up into multiple phases)