-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCP: SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework #762
Comments
@Walnux thanks!
we need to focus on the OLM path:
|
@mythi readOnlyRootFilesystem is very useful to protect the rootfs and we should keep it. |
And I agree with you @mythi after we can mannually start the operator and figure out the potential issues. We now should work on the bundle image and run the operator in OLM. That will apply another set of priviligy setting. |
also file a same bug to track this on red hat buggerzilla https://bugzilla.redhat.com/show_bug.cgi?id=2026086 |
the priviledged SCC needs to be used to run priviledge SGX device plugin container on OpenShift Container Platform for detail, please see intel#762 Signed-off-by: MartinXu <[email protected]>
#787 is sent for review |
according to the feed back from Peter Hunt [email protected],
Can be used to allow Pod to access the host filesystem without running pod with the privileged rights. |
@Walnux is |
It also works on NFD source dir. I am submitting the PR |
|
I believe Also, I am wondering if anyone would mind helping me figure out why adding Here's the pod I usedapiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2022-01-17T19:53:15Z"
labels:
app: sleepytereshkova
name: sleepytereshkova
spec:
containers:
- command:
- top
image: docker.io/library/alpine:latest
name: sleepytereshkova
volumeMounts:
- name: nfd-source-hooks
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z'
volumes:
- name: nfd-source-hooks
hostPath:
path: /etc/kubernetes/node-feature-discovery/source.d/
type: DirectoryOrCreate If someone would be willing, I would be interested in seeing the output of |
We also tested that with spc_t we don't have to use :z. |
yeah that makes sense. However, CRI-O should not be processing |
for instance, if you change it to or even differently: without spc_t, does it work? |
The /dev directory with :z doesn't work. |
according to Further more if we have to use spc_t, we have to carefully inspect the these two container images and make sure we didn't include any extra binaies that is not needed and increase the potential the security attack interfaces. Since all the cerficated images on OCP have to be based on UBI image., we have quickly gone through the UBI base images, the smallest one we can find is UBI-micro which is ~30M after decompressed. see https://catalog.redhat.com/software/containers/ubi8-micro/601a84aadd19c7786c47c8ea We are using |
my point is that
also, what installs the device or better yet, do you have access to a node with this device that I can play around with? I would be happy to investigate a solution for y'all (ideally that doesn't give spc_t) |
it's an in-tree kernel driver, RHEL 8.4+ has it as a tech preview. would that already cover the rules part automatically? I'll work with @Walnux to check if was get you an access to a node with SGX. |
I have tied and it can work without spc_t. :) |
what is
as I believe all it's doing is creating a new directory there |
It is not easy for me to use the current upstream container image which is using gcr.io/distroless/static to debug and acquire /proc/mounts. I will try to use the UBI-micro based image and check whether I can easily acquire /proc/mounts. |
@Walnux you can build toybox using cat support pretty easily:
|
@haircommander I think you are right. :z should just create a new directory there. And it actually just hides the issue. And I also checked the host, the hook file is not installed there. daemonset yaml without :z
Log:
daemonset yaml with :z
log:
daemonset yaml
|
we are trying to figure a way to let @haircommander access the node with SGX support. But that needs some time and efforts. Before that, I can just work as the proxy for @haircommander and try to figure out a proper solution. :) |
@haircommander Any updates? |
what's |
I guess what you talking about is Host OS. |
currently that version of containers-selinux isn't tareted at 4.10, so I am guessing at earliest it would make 4.11. We have to be careful about bumps to containers-selinux so close to GA, and the fix may not qualify for backport. For the |
@haircommander I will try that solution. Also is there a way to test the up-coming fix for container-selinux in our cluster. is there a link to a branch or build for us to test? Thanks |
Hi @haircommander @rhatdan, |
are you planning on using an operator to install on OCP? maybe it could install machine configs that enable that policy. |
Yes, Operator should be the right way to install it. Can you point us some detail about the machine config? I think this should not a requirement from us. It should be a general request. :) |
If these devices get added to a container, then there is no need to label, the devices will get the label of the container. If you are volume mounting them into the container, then they would not be allowed access. @haircommander How do you add a device to a container with k8s? |
that's the problem, this is a container that enables other containers to add devices. there's no way to do so without a device plugin, but we're putting together the device plugin... |
If you need to volume mount them in, and want the containers to have access then you could just To make this permanent, you could execute something like:
Then when the devices got created at reboot they would be labeled correctly. |
@rhatdan thanks! can these be managed by a |
Not my area of expertise, but I think MachineConfig should be able to do it. |
typically the way to do it on rhcos is to create a machine config that creates a systemd unit file that runs the commands. Those kinds of state changes often don't persist across reboots otherwise. the sgx operator could create said machineconfig and trigger a reboot, then the device would be available on the next reboot |
@haircommander OK, this sounds like a reasonable workaround until the problem gets fixed in the next/future release. However, I think we should try to leverage SRO+MCO operators for this and not add the functionality into the device plugins operator. |
gotcha, then maybe SRO would be a good fit for this. Is there a registration of the SGX plugin in the SRO? ideally this unit would only be run when SGX device is enabled and installing. |
@haircommander if we use SRO, should we install the policy from a container? If we can package the policy into a container and install it through a container, we can use the standard way to release and install policy on OCP. I am trying to do that. Do you guys know someone ever try it before? |
not for policy but I do know privileged containers are used to configure things on the node. However, it's usually on startup from what I know. Something to think about: if we're using a privileged container to create a file on the host, is that much different from having the SGX plugin container being privileged? |
@haircommander we currently have 6 plugins supported by the operator so I guess having one centralized one run as privileged is better than having to run all those 6. AFAIU, this would also be a stop gap until it's possible to deploy plugins without having to configure these labels separately. |
good points, makes sense to me. @rhatdan if an selinux policy is configured, does the node need to be rebooted for it to take effect? (it's possible rhcos also behaves differently in this case, in which case we may need the reboot anyway) |
No SELinux does not require a reboot, as long as it was enabled in the first place. Policy is instantly applied, and labels are placed on disk by restorecon. |
@rhatdan One other issue we encountered is that it looks like the socket communication is not allowed between containers. our plugins use this to communicate. we had to manually create a selinux policy to allow it. Is there a way to allow this without deploying custom selinux policy. we used a policy something like: #============= container_t ============== |
What is running as container_runtime_t? The intel-device-plugin? |
the SGX plugin is running as container_t. we got that policy from audit2allow. |
The allow rule above shows a container attempting to connectto a process running as container_runtime_t, which is the label of the container engine like Podman or CRI-O. |
Thats strange. we saw the log below in audit log and we ran the audit2allow and it gave that rule.
i just checked the plugin is container_t. its strange the rule came out as container_runtime_t. sh-4.4# ps -AZ | grep intel_sgx |
See if you can create the AVC again. It might have been an older test. |
I tried it several times but the policy audit2allow gives is the same. type=AVC msg=audit(1649881114.712:151954): avc: denied { connectto } for pid=3736904 comm="intel_sgx_devic" path="/var/lib/kubelet/device-plugins/kubelet.sock" scontext=system_u:system_r:intelplugins_t:s0:c131,c171 tcontext=system_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=1 the new policy looks something like this.
|
This issue has been fixed in containers/container-selinux#178 |
Comments
The issue is:
If I enable SeLinux up as below on my work node
Then My initial container will run into "permission access denied" issue on all the volume mounted in the pod
if I close the Selinux as below
The operator can be up and running properly.
You can reproduce the issue using the below steps
Reproduce Steps
Firstly I have to apply below patches to setup SCC according to documents:
SCC in OCP-4.9
Guide to UID, GID
run operator manually
Then start the intel device plugins framework using command
$ oc apply -k intel-device-plugins-for-kubernetes/deployments/operator/default/
and start SGX pluin DS as
oc apply -f intel-device-plugins-for-kubernetes/deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml
The intel device plugins framework can up and running, and the SGX plugin DS also up and running.
But the init container in the pod run into the "permission access denied issue" when try to access directory
/etc/kubernetes/node-feature-discovery/source.d/
Run operator though OLM
You can also run the operator through OLM
operator-sdk run bundle docker.io/walnuxdocker/intel-device-plugins-operator-bundle:0.22.0
The result is the same with run manually
this is the volume mounted in the pod
Analysis:
You can see that I assigned the SCC as hostmount-anyuid.
And after I disabled the Selinux with command on worknode 1 with command
$sudo setenforce 0
Operator up and run on this node.
But I leave Selinux enable on worknode 0
"The permission access denied issue still there"
After I set the SCC as hostaccess, no matter I disable or enable the SeLinux, The permission access denied issue always happens.
The proper way to access shared directory in pod
mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z' and using SCC hostmount-anyuid, looks like above issue can be resolved the init container can work with Selinux set as enforcing mode.
the root cause is:
According to https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers
The root cause might be:
The container engine, Podman, launches each container with a unique process SELinux label (usually container_t) and labels all of the container content with a single label (usually container_file_t). We have rules that state that container_t can read and write all content labeled container_file_t. This simple idea has blocked major file system exploits.
Everything works perfectly until the user attempts a volume mount. The problem with volumes is that they usually only bind mounts on the host. They bring in the labels from the host, which the SELinux policy does not allow the process label to interact with, and the container blows up.
However the sgxplugin container runinto permission access deny issue
The error is:
E1130 05:11:07.898395 1 sgx_plugin.go:75] No SGX enclave file available: stat /dev/sgx_enclave: permission denied
Try to resolve the above issue
using the similar way to mount /dev/sgx_enclave with :z
It runs into below error
sgx_plugin.go:75] No SGX enclave file available: stat /dev/sgx_enclave: no such file or directory
The proper way to access host devices from the container
After I use SCC privileged, and
set privileged: true
above issue can be resolved.
according to https://kubernetes.io/docs/concepts/policy/pod-security-policy/
a "privileged" container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.
I am concerned about using this privilege right
And others also has the similar concern and request a new feature in K8S
See kubernetes/kubernetes#60748
However, since the SGX device plugin has to access the SGX devices of host, looks like we can only use the privileged container.
@mythi What's your comments? :)
reference to similar project like SRO
In Special resource operator, looks like the similar security policy is applied
https://github.com/openshift/special-resource-operator/blob/master/charts/xilinx/fpga-xrt-driver-4.7.11/templates/1000-driver-container.yaml#L17
https://github.com/openshift/special-resource-operator/blob/master/charts/xilinx/fpga-xrt-driver-4.7.11/templates/1000-driver-container.yaml#L70
The text was updated successfully, but these errors were encountered: