OCP: SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework #762

Walnux · 2021-11-22T22:48:14Z

Comments

I'd like to open his issue to discuss and track the SCC and SeLinux setting issue on OpenShift-4.9 platform to enable intel-device-plugin framework and SGX plugin.
if I mount the directory with :z, and open SeLinux as enforcing mode, below issue can be resolved but it will run into access host devices(/dev/sgx_x) deney issue, for detail please see section " The proper way to access shared directory in pod "
After I run the device plugin as a privileged container with SCC privileged, the issue can be resolved. please see section "The proper way to access host devices in container/pod"

The issue is:

If I enable SeLinux up as below on my work node

sh-4.4# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

Then My initial container will run into "permission access denied" issue on all the volume mounted in the pod
if I close the Selinux as below

sh-4.4# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

The operator can be up and running properly.
You can reproduce the issue using the below steps

Reproduce Steps

Firstly I have to apply below patches to setup SCC according to documents:
SCC in OCP-4.9
Guide to UID, GID

Author: MartinXu <[email protected]>
Date:   Thu Nov 18 22:29:51 2021 -0500

    Add SCC hostaccess to manager-role on OpenShift

    So the default SA (Service Account) can have the privilige to create pod to access to all
    host namespaces but still requires pods to be run with a UID and SELinux context that are
    allocated to the namespace.

    For detail
    see https://docs.openshift.com/container-platform/4.9/authentication/managing-security-context-constraints.html

diff --git a/deployments/operator/rbac/role.yaml b/deployments/operator/rbac/role.yaml
index 8d19b7a..dd93674 100644
--- a/deployments/operator/rbac/role.yaml
+++ b/deployments/operator/rbac/role.yaml
@@ -176,3 +176,11 @@ rules:
   - get
   - list
   - watch
+- apiGroups:
+  - security.openshift.io
+  resources:
+  - securitycontextconstraints
+  resourceNames:
+  -  hostmount-anyuid
+  verbs:
+  - use

commit 9e3106cef687a7f83ed7daed90575f7e16b16993
Author: Xu <[email protected]>
Date:   Thu Nov 18 19:27:25 2021 -0500

    Dropoff securityContext from manager deployment

    OpenShift SCC (Security Context Constraints) is used to manage security
    context. See
    https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids
    https://docs.openshift.com/container-platform/4.9/authentication/managing-security-context-constraints.html

    By default restricted SCC is used to Ensure that pods cannot be run as privileged.
    So this commit drops off the securityconttext to run as non-root user

diff --git a/deployments/operator/default/manager_auth_proxy_patch.yaml b/deployments/operator/default/manager_auth_proxy_patch.yaml
index 8ba668c..082782f 100644
--- a/deployments/operator/default/manager_auth_proxy_patch.yaml
+++ b/deployments/operator/default/manager_auth_proxy_patch.yaml
@@ -19,11 +19,11 @@ spec:
         ports:
         - containerPort: 8443
           name: https
-        securityContext:
-          runAsNonRoot: true
-          runAsUser: 1000
-          runAsGroup: 1000
-          readOnlyRootFilesystem: true
+          #securityContext:
+          #runAsNonRoot: true
+          #runAsUser: 1000
+          #runAsGroup: 1000
+          #readOnlyRootFilesystem: true
       - name: manager
         args:
         - "--metrics-addr=127.0.0.1:8080"
diff --git a/deployments/operator/manager/manager.yaml b/deployments/operator/manager/manager.yaml
index db335d3..9ee0a94 100644
--- a/deployments/operator/manager/manager.yaml
+++ b/deployments/operator/manager/manager.yaml
@@ -33,11 +33,11 @@ spec:
           requests:
             cpu: 100m
             memory: 20Mi
-        securityContext:
-          runAsNonRoot: true
-          runAsUser: 65532
-          runAsGroup: 65532
-          readOnlyRootFilesystem: true
+        #securityContext:
+        #runAsNonRoot: true
+        #runAsUser: 65532
+        #runAsGroup: 65532
+        #readOnlyRootFilesystem: true
         env:
           - name: DEVICEPLUGIN_NAMESPACE
             valueFrom:

commit fbf8bd8b120ab65fc456d4778fb156214230ffac
Author: MartinXu <[email protected]>
Date:   Thu Nov 18 20:45:51 2021 -0500

    Backport https://github.com/intel/intel-device-plugins-for-kubernetes/pull/756

diff --git a/deployments/operator/rbac/role.yaml b/deployments/operator/rbac/role.yaml
index 3e490e5..8d19b7a 100644
--- a/deployments/operator/rbac/role.yaml
+++ b/deployments/operator/rbac/role.yaml
@@ -143,6 +143,12 @@ rules:
   - patch
   - update
   - watch
+- apiGroups:
+  - deviceplugin.intel.com
+  resources:
+  - sgxdeviceplugins/finalizers
+  verbs:
+  - update
 - apiGroups:
   - deviceplugin.intel.com
   resources:

run operator manually

Then start the intel device plugins framework using command
$ oc apply -k intel-device-plugins-for-kubernetes/deployments/operator/default/
and start SGX pluin DS as
oc apply -f intel-device-plugins-for-kubernetes/deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml

The intel device plugins framework can up and running, and the SGX plugin DS also up and running.
But the init container in the pod run into the "permission access denied issue" when try to access directory
/etc/kubernetes/node-feature-discovery/source.d/

Run operator though OLM

You can also run the operator through OLM
operator-sdk run bundle docker.io/walnuxdocker/intel-device-plugins-operator-bundle:0.22.0
The result is the same with run manually
this is the volume mounted in the pod

 nodeSelector:
    feature.node.kubernetes.io/custom-intel.sgx: 'true'
    kubernetes.io/arch: amd64
  restartPolicy: Always
  initContainers:
    - name: intel-sgx-initcontainer
      image: 'intel/intel-sgx-initcontainer:0.22.0'
      resources: {}
      volumeMounts:
        - name: nfd-source-hooks
          mountPath: /etc/kubernetes/node-feature-discovery/source.d/
        - name: kube-api-access-nkpq6
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
      securityContext:
        capabilities:
          drop:
            - MKNOD
        readOnlyRootFilesystem: true

Analysis:

You can see that I assigned the SCC as hostmount-anyuid.
And after I disabled the Selinux with command on worknode 1 with command
$sudo setenforce 0
Operator up and run on this node.
But I leave Selinux enable on worknode 0
"The permission access denied issue still there"

After I set the SCC as hostaccess, no matter I disable or enable the SeLinux, The permission access denied issue always happens.

The proper way to access shared directory in pod

mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z' and using SCC hostmount-anyuid, looks like above issue can be resolved the init container can work with Selinux set as enforcing mode.
the root cause is:
According to https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers
The root cause might be:

The container engine, Podman, launches each container with a unique process SELinux label (usually container_t) and labels all of the container content with a single label (usually container_file_t). We have rules that state that container_t can read and write all content labeled container_file_t. This simple idea has blocked major file system exploits.

Everything works perfectly until the user attempts a volume mount. The problem with volumes is that they usually only bind mounts on the host. They bring in the labels from the host, which the SELinux policy does not allow the process label to interact with, and the container blows up.

However the sgxplugin container runinto permission access deny issue

 initContainers:
    - name: intel-sgx-initcontainer
      image: 'intel/intel-sgx-initcontainer:0.22.0'
      resources: {}
      volumeMounts:
        - name: nfd-source-hooks
          mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z'
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
      securityContext:
        readOnlyRootFilesystem: false

The error is:
E1130 05:11:07.898395 1 sgx_plugin.go:75] No SGX enclave file available: stat /dev/sgx_enclave: permission denied

Try to resolve the above issue
using the similar way to mount /dev/sgx_enclave with :z

  containers:
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: intel-sgx-plugin
      securityContext:
        readOnlyRootFilesystem: false
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - name: sgxdevices
          mountPath: /dev/sgx
        - name: sgx-enclave
          mountPath: '/dev/sgx_enclave:z'

It runs into below error
sgx_plugin.go:75] No SGX enclave file available: stat /dev/sgx_enclave: no such file or directory

The proper way to access host devices from the container

After I use SCC privileged, and
set privileged: true

 containers:
        - resources: {}
          terminationMessagePath: /dev/termination-log
          name: intel-sgx-plugin
          securityContext:
            privileged: true

above issue can be resolved.

according to https://kubernetes.io/docs/concepts/policy/pod-security-policy/
a "privileged" container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.

I am concerned about using this privilege right
And others also has the similar concern and request a new feature in K8S
See kubernetes/kubernetes#60748

However, since the SGX device plugin has to access the SGX devices of host, looks like we can only use the privileged container.
@mythi What's your comments? :)

reference to similar project like SRO

In Special resource operator, looks like the similar security policy is applied
https://github.com/openshift/special-resource-operator/blob/master/charts/xilinx/fpga-xrt-driver-4.7.11/templates/1000-driver-container.yaml#L17

https://github.com/openshift/special-resource-operator/blob/master/charts/xilinx/fpga-xrt-driver-4.7.11/templates/1000-driver-container.yaml#L70

The text was updated successfully, but these errors were encountered:

mythi · 2021-11-23T07:57:01Z

@Walnux thanks!

Then start the intel device plugins framework using commd

we need to focus on the OLM path:

the UID/GID changes are fine. Do we also need to drop readOnlyRootFilesystem?
I'll test and merge operator: allow controllers to touch ownerReferences always #756
The SCC configs must be done in the ClusterServiceVersion metadata. Look for clusterPermissions.

Walnux · 2021-11-23T18:05:30Z

@mythi readOnlyRootFilesystem is very useful to protect the rootfs and we should keep it.

Walnux · 2021-11-23T18:07:42Z

And I agree with you @mythi after we can mannually start the operator and figure out the potential issues. We now should work on the bundle image and run the operator in OLM. That will apply another set of priviligy setting.

Walnux · 2021-11-23T18:08:43Z

also file a same bug to track this on red hat buggerzilla https://bugzilla.redhat.com/show_bug.cgi?id=2026086

the priviledged SCC needs to be used to run priviledge SGX device plugin container on OpenShift Container Platform for detail, please see intel#762 Signed-off-by: MartinXu <[email protected]>

Walnux · 2021-12-03T18:21:07Z

#787 is sent for review

Walnux · 2022-01-07T04:51:38Z

according to the feed back from Peter Hunt [email protected],

  securityContext:
    seLinuxOptions:
      type: "spc_t"

Can be used to allow Pod to access the host filesystem without running pod with the privileged rights.
It can be verified to work properly. Will submit PR to fix the issue.

mythi · 2022-01-07T11:26:01Z

@Walnux is spc_t the only type label that works or can we go with the one that NFD source dir is labeled?

Walnux · 2022-01-10T21:16:56Z

It also works on NFD source dir. I am submitting the PR

mythi · 2022-01-11T07:14:38Z

spc_t probably works but are there other *_t types that'd fit better?

haircommander · 2022-01-17T20:06:02Z

I believe spc_t is the right option. container selinux module doesn't really give much granularity (other than special types for containers that are init processes and kata containers).

Also, I am wondering if anyone would mind helping me figure out why adding :z helps anything. From my testing, all I can see that it does is append :z to the destination path.

Here's the pod I used

apiVersion: v1                                                                                                                                    
kind: Pod                                                                                                                                         
metadata:                                                                                                                                         
  creationTimestamp: "2022-01-17T19:53:15Z"                                                                                                       
  labels:                                                                                                                                         
    app: sleepytereshkova                                                                                                                         
  name: sleepytereshkova                                                                                                                          
spec:                                                                                                                                             
  containers:                                                                                                                                     
  - command:                                                                                                                                      
    - top                                                                                                                                         
    image: docker.io/library/alpine:latest                                                                                                        
    name: sleepytereshkova                                                                                                                        
    volumeMounts:                                                                                                                                 
     - name: nfd-source-hooks                                                                                                                     
       mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z'                                                                            
  volumes:                                                                                                                                        
    - name: nfd-source-hooks                                                                                                                      
      hostPath:                                                                                                                                   
        path: /etc/kubernetes/node-feature-discovery/source.d/                                                                                    
        type: DirectoryOrCreate

If someone would be willing, I would be interested in seeing the output of /proc/mounts for the pod that has :z in its mount

Walnux · 2022-01-17T20:34:24Z

We also tested that with spc_t we don't have to use :z.
You can see below documents why we tried :z
https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers

haircommander · 2022-01-17T21:37:07Z

We also tested that with spc_t we don't have to use :z.
You can see below documents why we tried :z
https://www.redhat.com/sysadmin/user-namespaces-selinux-rootless-containers

yeah that makes sense. However, CRI-O should not be processing :z, whereas podman is expected to. I see why you'd try it, but I mostly want to figure out why it worked (and possibly stop it from working, if it looks like a bug)

haircommander · 2022-01-17T21:39:01Z

for instance, if you change it to
mountPath: '/dev/sgx_enclave:Z'

or even differently:
mountPath:'/dev/sgx_enclave_other'

without spc_t, does it work?

Walnux · 2022-01-17T21:58:33Z

The /dev directory with :z doesn't work.
only the other normal directory
like mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z'
it works without spc_t.

Walnux · 2022-01-17T22:49:53Z

according to
https://developers.redhat.com/blog/2014/11/06/introducing-a-super-privileged-container-concept#
spc_t is still the a super priviledged conatiner which "only" applies the mount namespace. (Please correct me)
I feel this is still pretty privileged for us.
Firstly, for the init container we only need to copy the NFD hook from conatiner into /etc/kubernetes/node-feature-discovery/source.d/ on the host. I don't think we have to use spec_t. Personally, I like the :z solution.
Secondly, for the sgx plugin container. we only need to acess the /dev/sgx_x device interface on host from the container. I think running as spc_t assinged too much privileges for the container. A more fine control might be needed instead of directly run as some pretty prioviledged container.
I actually like the idea of
kubernetes/kubernetes#60748

Further more if we have to use spc_t, we have to carefully inspect the these two container images and make sure we didn't include any extra binaies that is not needed and increase the potential the security attack interfaces.

Since all the cerficated images on OCP have to be based on UBI image., we have quickly gone through the UBI base images, the smallest one we can find is UBI-micro which is ~30M after decompressed. see https://catalog.redhat.com/software/containers/ubi8-micro/601a84aadd19c7786c47c8ea

We are using
#852
to track the UBI based image task

haircommander · 2022-01-18T14:30:26Z

Firstly, for the init container we only need to copy the NFD hook from conatiner into /etc/kubernetes/node-feature-discovery/source.d/ on the host. I don't think we have to use spec_t. Personally, I like the :z solution.

my point is that :z solution shouldn't work, and if it does I want to stop it from working.
can you try

      volumeMounts:
        - name: nfd-source-hooks
          mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:Z'

also, what installs the device /dev/sgx_x? maybe a selinux rule could be added to allow containers access to it?

or better yet, do you have access to a node with this device that I can play around with? I would be happy to investigate a solution for y'all (ideally that doesn't give spc_t)

mythi · 2022-01-18T15:46:53Z

also, what installs the device /dev/sgx_x? maybe a selinux rule could be added to allow containers access to it?

it's an in-tree kernel driver, RHEL 8.4+ has it as a tech preview. would that already cover the rules part automatically? I'll work with @Walnux to check if was get you an access to a node with SGX.

Walnux · 2022-01-18T18:10:11Z

  volumeMounts:
    - name: nfd-source-hooks
      mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:Z'

I have tied and it can work without spc_t. :)

haircommander · 2022-01-18T18:12:36Z

I have tied and it can work without spc_t. :)

what is /proc/mounts inside the container that has this mounted? I believe you have just mounted the literal directory /etc/kubernetes/node-feature-discovery/source.d/:Z. As a final piece of experimentation, can you try

  volumeMounts:
    - name: nfd-source-hooks
      mountPath: '/etc/kubernetes/node-feature-discovery/source.d/sources'

as I believe all it's doing is creating a new directory there

Walnux · 2022-01-19T02:11:24Z

It is not easy for me to use the current upstream container image which is using gcr.io/distroless/static to debug and acquire /proc/mounts. I will try to use the UBI-micro based image and check whether I can easily acquire /proc/mounts.

mythi · 2022-01-19T04:55:01Z

@Walnux you can build toybox using cat support pretty easily:

$ git diff
diff --git a/build/docker/toybox-config b/build/docker/toybox-config
index df9e6d3..f415aa8 100644
--- a/build/docker/toybox-config
+++ b/build/docker/toybox-config
@@ -21,7 +21,7 @@ CONFIG_TOYBOX_GETRANDOM=y
 #
 # CONFIG_BASENAME is not set
 # CONFIG_CAL is not set
-# CONFIG_CAT is not set
+CONFIG_CAT=y
 # CONFIG_CAT_V is not set
 # CONFIG_CATV is not set
 # CONFIG_CHGRP is not set
$ make intel-sgx-initcontainer
...
$ docker run --entrypoint "" intel/intel-sgx-initcontainer:devel cat /proc/mounts
# (change the initContainer command to cat instead of the default entrypoint)

uMartinXu · 2022-01-20T07:01:25Z

@haircommander I think you are right. :z should just create a new directory there. And it actually just hides the issue. And I also checked the host, the hook file is not installed there.
Thanks!
I still paste the log here.

daemonset yaml without :z

      initContainers:
        - resources: {}
          terminationMessagePath: /dev/termination-log
          name: intel-sgx-initcontainer
          command:
            - sh
            - '-c'
            - >-
              cat /proc/mounts && cp -a /usr/local/bin/sgx-sw/intel-sgx-epchook
              /etc/kubernetes/node-feature-discovery/source.d/
          securityContext:
            readOnlyRootFilesystem: false
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfd-source-hooks
              mountPath: /etc/kubernetes/node-feature-discovery/source.d/

Log:

overlay / overlay rw,context="system_u:object_r:container_file_t:s0:c88,c734",relatime,lowerdir=/var/lib/containers/storage/overlay/l/CGEOKTEVBSWVW37STEBG7DSUZK:/var/lib/containers/storage/overlay/l/ZZ7PBK43SV6NRMUKXNTMLRJ2DG,upperdir=/var/lib/containers/storage/overlay/1ab5e921e8cba30560e8838dbb8b635553715681c5e743fe370045fe3b03e2ba/diff,workdir=/var/lib/containers/storage/overlay/1ab5e921e8cba30560e8838dbb8b635553715681c5e743fe370045fe3b03e2ba/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
mqueue /dev/mqueue mqueue rw,seclabel,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
tmpfs /sys/fs/cgroup tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,nodev,noexec,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup ro,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/pids cgroup ro,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/cpuset cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup ro,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/rdma cgroup ro,seclabel,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/perf_event cgroup ro,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/devices cgroup ro,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup ro,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup ro,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
shm /dev/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,nodev,noexec,relatime,size=65536k 0 0
tmpfs /etc/resolv.conf tmpfs rw,seclabel,nosuid,nodev,noexec,mode=755 0 0
tmpfs /etc/hostname tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
/dev/sda4 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/sda4 /dev/termination-log xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
**/dev/sda4 /etc/kubernetes/node-feature-discovery/source.d xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0**
tmpfs /var/run/secrets/kubernetes.io/serviceaccount tmpfs ro,seclabel,relatime,size=262629092k 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/acpi tmpfs ro,context="system_u:object_r:container_file_t:s0:c88,c734",relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/keys tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/timer_list tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/sched_debug tmpfs rw,context="system_u:object_r:container_file_t:s0:c88,c734",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/scsi tmpfs ro,context="system_u:object_r:container_file_t:s0:c88,c734",relatime 0 0
tmpfs /sys/firmware tmpfs ro,context="system_u:object_r:container_file_t:s0:c88,c734",relatime 0 0
cp: /etc/kubernetes/node-feature-discovery/source.d//intel-sgx-epchook: Permission denied

daemonset yaml with :z

      initContainers:
        - resources: {}
          terminationMessagePath: /dev/termination-log
          name: intel-sgx-initcontainer
          command:
            - sh
            - '-c'
            - >-
              cat /proc/mounts && cp -a /usr/local/bin/sgx-sw/intel-sgx-epchook
              /etc/kubernetes/node-feature-discovery/source.d/
          securityContext:
            readOnlyRootFilesystem: false
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfd-source-hooks
              mountPath: '/etc/kubernetes/node-feature-discovery/source.d/:z'

log:

overlay / overlay rw,context="system_u:object_r:container_file_t:s0:c235,c809",relatime,lowerdir=/var/lib/containers/storage/overlay/l/CGEOKTEVBSWVW37STEBG7DSUZK:/var/lib/containers/storage/overlay/l/ZZ7PBK43SV6NRMUKXNTMLRJ2DG,upperdir=/var/lib/containers/storage/overlay/44d494b8811e741dc3321a54bd84864f0e55a9c934b4a995dae042006c4b5e54/diff,workdir=/var/lib/containers/storage/overlay/44d494b8811e741dc3321a54bd84864f0e55a9c934b4a995dae042006c4b5e54/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
mqueue /dev/mqueue mqueue rw,seclabel,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
tmpfs /sys/fs/cgroup tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,nodev,noexec,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup ro,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/pids cgroup ro,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/cpuset cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup ro,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup ro,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup ro,seclabel,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/rdma cgroup ro,seclabel,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/perf_event cgroup ro,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/devices cgroup ro,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup ro,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup ro,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
shm /dev/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,nodev,noexec,relatime,size=65536k 0 0
tmpfs /etc/resolv.conf tmpfs rw,seclabel,nosuid,nodev,noexec,mode=755 0 0
tmpfs /etc/hostname tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
/dev/sda4 /etc/hosts xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/sda4 /dev/termination-log xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
tmpfs /run/secrets tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
**/dev/sda4 /etc/kubernetes/node-feature-discovery/source.d/:z xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0**
tmpfs /var/run/secrets/kubernetes.io/serviceaccount tmpfs ro,seclabel,relatime,size=262629092k 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/acpi tmpfs ro,context="system_u:object_r:container_file_t:s0:c235,c809",relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/keys tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/timer_list tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/sched_debug tmpfs rw,context="system_u:object_r:container_file_t:s0:c235,c809",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/scsi tmpfs ro,context="system_u:object_r:container_file_t:s0:c235,c809",relatime 0 0
tmpfs /sys/firmware tmpfs ro,context="system_u:object_r:container_file_t:s0:c235,c809",relatime 0 0

daemonset yaml

      initContainers:
        - resources: {}
          terminationMessagePath: /dev/termination-log
          name: intel-sgx-initcontainer
          command:
            - sh
            - '-c'
            - >-
              cat /proc/mounts && cp -a /usr/local/bin/sgx-sw/intel-sgx-epchook
              /etc/kubernetes/node-feature-discovery/source.d/
          securityContext:
            readOnlyRootFilesystem: false
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfd-source-hooks
              mountPath: /etc/kubernetes/node-feature-discovery/source.d/sources

log:
...
/dev/sda4 /etc/kubernetes/node-feature-discovery/source.d/sources xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota 0 0
...

uMartinXu · 2022-01-20T08:03:34Z

it's an in-tree kernel driver, RHEL 8.4+ has it as a tech preview. would that already cover the rules part automatically? I'll work with @Walnux to check if was get you an access to a node with SGX.

we are trying to figure a way to let @haircommander access the node with SGX support. But that needs some time and efforts. Before that, I can just work as the proxy for @haircommander and try to figure out a proper solution. :)

uMartinXu · 2022-01-24T02:39:24Z

@haircommander Any updates?
Thanks!

haircommander · 2022-01-25T21:58:11Z

what's ls -lZd /etc/kubernetes/node-feature-discovery/source.d/ ? either we have to change the label of that directory so all containers can access it, or we need to make this plugin privileged.

uMartinXu · 2022-01-26T04:47:27Z

I guess what you talking about is Host OS.
I use below way to access the node:
[jxu36@jfz1r09h07 ~]$ oc debug node/worker-1
Starting pod/worker-1-debug ...
To use host binaries, run chroot /host
Pod IP: 172.16.9.2
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ls
bin boot dev etc home lib lib64 media mnt opt ostree proc root run sbin srv sys sysroot tmp usr var
sh-4.4# ls -lZd /etc/kubernetes/node-feature-discovery/source.d/
drwxr-xr-x. 2 root root system_u:object_r:kubernetes_file_t:s0 31 Jan 21 20:40 /etc/kubernetes/node-feature-discovery/source.d/
sh-4.4# ls -lZd /dev
drwxr-xr-x. 20 root root system_u:object_r:device_t:s0 3360 Nov 30 06:09 /dev

haircommander · 2022-03-04T19:32:25Z

currently that version of containers-selinux isn't tareted at 4.10, so I am guessing at earliest it would make 4.11. We have to be careful about bumps to containers-selinux so close to GA, and the fix may not qualify for backport.

For the /etc/kubernetes/node-feature-discovery/source.d/ piece, have you tried the solution in #762 (comment)

mregmi · 2022-03-07T18:46:30Z

@haircommander I will try that solution. Also is there a way to test the up-coming fix for container-selinux in our cluster. is there a link to a branch or build for us to test? Thanks

Walnux · 2022-03-17T20:26:29Z

Hi @haircommander @rhatdan,
Is it possible that we can define the Selinux policy/label to assign the permissions which are really needed by the container?
For our SGX device plugin it only access /dev/sgx_provision /dev/sgx_epc. So We can define a label called Intel_sgx_t to only assign the access permission to these two device files. And all the other device files access should be denied.
I think that is the best way to protect security.
And I know we can define our own policy on Selinux in RHEL. But on OCP how to deploy the policy is a problem.

haircommander · 2022-03-17T21:03:09Z

are you planning on using an operator to install on OCP? maybe it could install machine configs that enable that policy.

Walnux · 2022-03-20T22:25:02Z

Yes, Operator should be the right way to install it. Can you point us some detail about the machine config? I think this should not a requirement from us. It should be a general request. :)

rhatdan · 2022-03-22T21:04:18Z

If these devices get added to a container, then there is no need to label, the devices will get the label of the container. If you are volume mounting them into the container, then they would not be allowed access. @haircommander How do you add a device to a container with k8s?

haircommander · 2022-03-22T21:05:22Z

that's the problem, this is a container that enables other containers to add devices. there's no way to do so without a device plugin, but we're putting together the device plugin...

rhatdan · 2022-03-22T21:06:48Z

If you need to volume mount them in, and want the containers to have access then you could just
chcon -t container_file_t /dev/sgx*

To make this permanent, you could execute something like:

semanage fcontext -a -t container_file_t '/dev/sgx.*'
restorecon -R -v /dev/sgx*

Then when the devices got created at reboot they would be labeled correctly.

mythi · 2022-03-23T04:43:14Z

To make this permanent, you could execute something like:
semanage fcontext -a -t container_file_t '/dev/sgx.*'
restorecon -R -v /dev/sgx*

@rhatdan thanks! can these be managed by a MachineConfig or we need the Special Resource Operator to run them?

rhatdan · 2022-03-23T11:54:05Z

Not my area of expertise, but I think MachineConfig should be able to do it.

haircommander · 2022-03-23T20:21:58Z

typically the way to do it on rhcos is to create a machine config that creates a systemd unit file that runs the commands. Those kinds of state changes often don't persist across reboots otherwise. the sgx operator could create said machineconfig and trigger a reboot, then the device would be available on the next reboot

mythi · 2022-03-24T06:23:33Z

the sgx operator could create said machineconfig and trigger a reboot, then the device would be available on the next reboot

@haircommander OK, this sounds like a reasonable workaround until the problem gets fixed in the next/future release. However, I think we should try to leverage SRO+MCO operators for this and not add the functionality into the device plugins operator.

haircommander · 2022-03-24T16:44:32Z

gotcha, then maybe SRO would be a good fit for this. Is there a registration of the SGX plugin in the SRO? ideally this unit would only be run when SGX device is enabled and installing.

Walnux · 2022-03-30T03:26:40Z

@haircommander if we use SRO, should we install the policy from a container? If we can package the policy into a container and install it through a container, we can use the standard way to release and install policy on OCP. I am trying to do that. Do you guys know someone ever try it before?
And looks like most of the people suggest to install the policy now is from rpm package, in this case, we can leverage MCO.

haircommander · 2022-03-30T13:34:11Z

not for policy but I do know privileged containers are used to configure things on the node. However, it's usually on startup from what I know. Something to think about: if we're using a privileged container to create a file on the host, is that much different from having the SGX plugin container being privileged?

mythi · 2022-03-30T14:09:53Z

Something to think about: if we're using a privileged container to create a file on the host, is that much different
from having the SGX plugin container being privileged?

@haircommander we currently have 6 plugins supported by the operator so I guess having one centralized one run as privileged is better than having to run all those 6. AFAIU, this would also be a stop gap until it's possible to deploy plugins without having to configure these labels separately.

haircommander · 2022-03-30T19:28:44Z

good points, makes sense to me. @rhatdan if an selinux policy is configured, does the node need to be rebooted for it to take effect? (it's possible rhcos also behaves differently in this case, in which case we may need the reboot anyway)

rhatdan · 2022-03-30T20:04:21Z

No SELinux does not require a reboot, as long as it was enabled in the first place. Policy is instantly applied, and labels are placed on disk by restorecon.

mregmi · 2022-04-04T16:12:41Z

@rhatdan One other issue we encountered is that it looks like the socket communication is not allowed between containers. our plugins use this to communicate. we had to manually create a selinux policy to allow it. Is there a way to allow this without deploying custom selinux policy. we used a policy something like:

#============= container_t ==============
allow container_t container_runtime_t:unix_stream_socket connectto;

rhatdan · 2022-04-04T18:01:50Z

What is running as container_runtime_t? The intel-device-plugin?

mregmi · 2022-04-04T18:47:29Z

the SGX plugin is running as container_t. we got that policy from audit2allow.

rhatdan · 2022-04-04T18:51:09Z

The allow rule above shows a container attempting to connectto a process running as container_runtime_t, which is the label of the container engine like Podman or CRI-O.

mregmi · 2022-04-04T19:02:56Z

Thats strange. we saw the log below in audit log and we ran the audit2allow and it gave that rule.

/var/log/audit/audit.log.1:type=AVC msg=audit(1648502191.123:87396): avc: denied { connectto } for pid=1514382 comm="intel_sgx_devic" path="/var/lib/kubelet/device-plugins/kubelet.sock" scontext=system_u:system_r:container_t:s0:c149,c701 tcontext=system_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=0

i just checked the plugin is container_t. its strange the rule came out as container_runtime_t.

sh-4.4# ps -AZ | grep intel_sgx
system_u:system_r:container_t:s0:c612,c793 3927534 ? 00:00:39 intel_sgx_devic

rhatdan · 2022-04-04T19:06:57Z

See if you can create the AVC again. It might have been an older test.

mregmi · 2022-04-14T00:07:18Z

I tried it several times but the policy audit2allow gives is the same.
We have modified the policy a bit and now we created a new domain/process label for our plugin and gave permission to this label.
sh-4.4# ps -AZ | grep intel
system_u:system_r:container_t:s0:c19,c27 706721 ? 00:08:45 intel_deviceplu
system_u:system_r:intelplugins_t:s0:c545,c815 3769827 ? 00:00:00 intel_sgx_devic

type=AVC msg=audit(1649881114.712:151954): avc: denied { connectto } for pid=3736904 comm="intel_sgx_devic" path="/var/lib/kubelet/device-plugins/kubelet.sock" scontext=system_u:system_r:intelplugins_t:s0:c131,c171 tcontext=system_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=1

the new policy looks something like this.

policy_module(intelplugins, 1.0)

gen_require(`
        type container_file_t;
        type device_t;
')

container_domain_template(intelplugins)

#============= intelplugins_t ==============
allow intelplugins_t container_runtime_t:unix_stream_socket connectto;
allow intelplugins_t device_t:chr_file getattr;

Walnux · 2022-05-02T16:18:52Z

This issue has been fixed in containers/container-selinux#178
So close it. :)

Walnux changed the title ~~SCC and SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework~~ OCP: SCC and SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework Nov 30, 2021

Walnux mentioned this issue Nov 30, 2021

OCP: intel-device-plugins operator with SGX support on Red Hat OCP (OpenShift Container Platform) #777

Closed

8 tasks

Walnux mentioned this issue Dec 3, 2021

Use OCP privileged SCC to run SGX device plugin #787

Closed

Walnux changed the title ~~OCP: SCC and SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework~~ OCP: SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework Mar 29, 2022

mythi mentioned this issue Apr 19, 2022

changes related to selinux and permissions for openshift #970

Merged

Walnux closed this as completed May 2, 2022

uMartinXu mentioned this issue Oct 20, 2022

Sanity Checking and Driver State Label support kubernetes-sigs/kernel-module-management#117

Closed

OCP: SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework #762

OCP: SeLinux issue on OpenShif-4.9 to run SGX on intel-device-plugins framework #762

Comments

Walnux commented Nov 22, 2021 • edited Loading

Comments

The issue is:

Reproduce Steps

run operator manually

Run operator though OLM

Analysis:

The proper way to access shared directory in pod

The proper way to access host devices from the container

reference to similar project like SRO

mythi commented Nov 23, 2021

Walnux commented Nov 23, 2021

Walnux commented Nov 23, 2021

Walnux commented Nov 23, 2021

Walnux commented Dec 3, 2021

Walnux commented Jan 7, 2022

mythi commented Jan 7, 2022

Walnux commented Jan 10, 2022

mythi commented Jan 11, 2022

haircommander commented Jan 17, 2022

Walnux commented Jan 17, 2022

haircommander commented Jan 17, 2022

haircommander commented Jan 17, 2022

Walnux commented Jan 17, 2022 • edited Loading

Walnux commented Jan 17, 2022 • edited Loading

haircommander commented Jan 18, 2022

mythi commented Jan 18, 2022

Walnux commented Jan 18, 2022

haircommander commented Jan 18, 2022

Walnux commented Jan 19, 2022

mythi commented Jan 19, 2022

uMartinXu commented Jan 20, 2022 • edited Loading

uMartinXu commented Jan 20, 2022 • edited Loading

uMartinXu commented Jan 24, 2022

haircommander commented Jan 25, 2022

uMartinXu commented Jan 26, 2022

haircommander commented Mar 4, 2022

mregmi commented Mar 7, 2022

Walnux commented Mar 17, 2022

haircommander commented Mar 17, 2022

Walnux commented Mar 20, 2022

rhatdan commented Mar 22, 2022

haircommander commented Mar 22, 2022

rhatdan commented Mar 22, 2022

mythi commented Mar 23, 2022

rhatdan commented Mar 23, 2022

haircommander commented Mar 23, 2022

mythi commented Mar 24, 2022

haircommander commented Mar 24, 2022

Walnux commented Mar 30, 2022 • edited Loading

haircommander commented Mar 30, 2022

mythi commented Mar 30, 2022

haircommander commented Mar 30, 2022

rhatdan commented Mar 30, 2022

mregmi commented Apr 4, 2022

rhatdan commented Apr 4, 2022

mregmi commented Apr 4, 2022 • edited Loading

rhatdan commented Apr 4, 2022

mregmi commented Apr 4, 2022

rhatdan commented Apr 4, 2022

mregmi commented Apr 14, 2022

Walnux commented May 2, 2022

Walnux commented Nov 22, 2021 •

edited

Loading

Walnux commented Jan 17, 2022 •

edited

Loading

Walnux commented Jan 17, 2022 •

edited

Loading

uMartinXu commented Jan 20, 2022 •

edited

Loading

uMartinXu commented Jan 20, 2022 •

edited

Loading

Walnux commented Mar 30, 2022 •

edited

Loading

mregmi commented Apr 4, 2022 •

edited

Loading