Fix logic on volume SELinux labels #196

jdoss · 2022-10-20T19:41:56Z

This will fix #184 by not applying SELinux contexts to privileged containers (CSI plugins run as privileged containers) and it will also allow for CSI plugins to correctly label SELinux contexts when mounting storage into a container when SELinux is set to enforcing.

This should close #66 as this PR would prevent SELinux contexts from being applied to CSI mounts and they would be unusable on hosts that have SELinux set to enforcing.

I have tested this with the https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver and I am able to read and write to volume mounts from this CSI plugin as expected:

# ls -lahZ /opt/nomad/data/client/csi/node/gcp-pd-csi/per-alloc/ec76605e-24c6-8fe4-1a84-a6d8e6a9a6ff/nginx-selinux-test/rw-file-system-single-node-writer/                          
total 20K
drwxr-xr-x. 3 root root system_u:object_r:container_file_t:s0 4.0K Oct 20 19:04 .
drwx------. 3 root root system_u:object_r:var_t:s0              47 Oct 20 19:24 ..
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0    0 Oct 20 19:04 bye.txt
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0    0 Oct 20 14:11 hello.txt
drwx------. 2 root root system_u:object_r:container_file_t:s0  16K Oct 20 11:31 lost+found

jdoss · 2022-10-20T19:46:39Z

Fixes #65

jdoss · 2022-10-21T19:03:59Z

I will note that I saw this error when trying to start a CSI controller:

Oct 21, '22 13:39:14 -0500 	Driver Failure 	rpc error: code = Unknown desc = failed to start task, could not start container: cannot start container, status code: 500: {"cause":"SELinux relabeling of /dev is not allowed","message":"SELinux relabeling of /dev is not allowed","response":500}

which I thought was odd because the csi disk documentation says that only the nodes need be running as privileged, but it seems that both node and controller plugins are mounting /dev/ as rw when being registered as CSI plugins.

Running both the controller and the node tasks as privileged = true allows everything to work correctly with SELinux enabled with this driver.

@tgross should CSI controllers need rw access to /dev? If not, that feels like a Nomad specific bug and I can open an issue on the Nomad repo. Since controllers are not doing the mounting, they shouldn't need access to /dev right?

Here is my GCP PD job file:

job "csi-gcp-pd-plugin" {
  datacenters = ["us-central1"]
  type = "system"

  group "controller" {
    task "gcp-controller-plugin" {
      driver = "podman"
      config {
        image = "k8s.gcr.io/cloud-provider-gcp/gcp-compute-persistent-disk-csi-driver:v1.7.2"
        args = [
          "--endpoint=unix:///csi/csi.sock",
          "--v=6",
          "--logtostderr",
          "--run-node-service=false"
        ]
        privileged = true
      }
      env {
        GOOGLE_APPLICATION_CREDENTIALS = "/secrets/gcp-pd-csi.json"
      }
      template {
        destination = "secrets/gcp-pd-csi.json"
        data        = <<-EOF
        {{ "my-base64-encoded-gcp-service-account-json" | base64Decode }}
        EOF
      }

      csi_plugin {
        id        = "gcp-pd-csi"
        type      = "controller"
        mount_dir = "/csi"
      }
      resources {
        memory = 256
      }
    }
  }
  group "node" {
    task "gcp-node-plugin" {
      driver = "podman"
      config {
        image = "k8s.gcr.io/cloud-provider-gcp/gcp-compute-persistent-disk-csi-driver:v1.7.2"

        args = [
          "--endpoint=unix:///csi/csi.sock",
          "--v=6",
          "--logtostderr",
          "--run-controller-service=false"
        ]
        privileged = true
      }
      env {
          GOOGLE_APPLICATION_CREDENTIALS = "/secrets/gcp-pd-csi.json"
      }
      template {
          destination = "secrets/gcp-pd-csi.json"
          data        = <<-EOF
          {{ "my-base64-encoded-gcp-service-account-json" | base64Decode }}
          EOF
      }
      csi_plugin {
        id        = "gcp-pd-csi"
        type      = "node"
        mount_dir = "/csi"
      }
      resources {
        memory = 256
      }

    }

  }
}

tgross · 2022-10-21T19:13:00Z

@tgross should CSI controllers need rw access to /dev? If not, that feels like a Nomad specific bug and I can open an issue on the Nomad repo. Since controllers are not doing the mounting, they shouldn't need access to /dev right?

The specification is ambiguous on what "controller publish" actually means and leaves it up to the plugin, so yeah it's entirely possible we've hit a controller plugin that requires it in the case of selinux relabeling.

lgfa29

Thanks for the PR and all the context provided in the linked issues @jdoss!

I'm not well versed in SELinux, so apologies in advance for my ignorance and anything wrong I say 😅

From what I understood the root of the problem is that the hots path of volumes in file systems that support SELinux must be mounted with the z label so that the runtime can properly set their policy to allow the container to access it.

By using a driver-level config of volumes.selinux_labels we don't have enough fine grain control to disable this when necessary (such as when running CSI plugins, or using plugins that don't support SELinux). I did a quick research on how Kubernetes handle this, and they seem to hardcode which plugins support and which ones don't, which doesn't seem like a good approach for us.

Using the privileged flag to check doesn't also seem ideal, as it's just loosely correlated with the problem.

Another thing I noticed is that this logic is pretty close to the Docker driver, and so I think there's a broader issue to be solved here.

Indeed, hashicorp/nomad#9123 points to a similar problem as you described in #185 and hashicorp/nomad#7094 has a similar changed as proposed in #66.

You made an interesting suggestion in #185:

Ideally, it would be cool to just add selinuxlabel = "z" to volume_mount blocks so we can handle SELinux labeling [...] on a per mount basis.

This could be one way forward. We have volume.mount_options so I was wondering if we could add a new field to store task driver specific flags, or something like that.

But coming back to the issue in hand, does the procedure you described in #184 work in general? Maybe we can document it for now until we come up with a better solution.

jdoss · 2022-11-15T19:03:09Z

Hey @lgfa29 I agree that there is a better path forward w/r/t SELinux and volume mounts in general, but it will be more involved of a fix.

The issue is that some jobs need to be run as privileged containers and using selinuxlabel = "z" setting in the plugin section applies this context to every mount. This is going to be problematic because the --privileged flag adds mounts such as /dev that cannot be relabeled. Also, privileged containers don't need this label added because they run without the constraints that SELinux applies.

Executing container engines with the --privileged flag tells the engine to launch the container process without any further "security" lockdown.

https://www.redhat.com/sysadmin/privileged-flag-container-engines

I have been running this patch in production for weeks now and it works as expected. I hope we can get this merged so it unblocks the usage of CSI plugins with this driver until a better solution is implemented which would give us better control on how selinuxlabel is applied to volume mounts. I do think that if a container is run with --privileged it should be excluded anyways because it doesn't need anything from selinuxlabel. It already has it all the privileges it needs from the --privileged flag.

lgfa29

Thank you so much for the detailed response @jdoss, I definitely learned something new today 😄

Now I understand the problem and this change looks good 👍

There's still some broader discussion we need to have around CSI and volume handling, but that's outside the scope of this change, which threw me off a bit while reviewing it.

As I mentioned earlier, the implementation here is very similar to our Docker driver, and so I imagine it suffers from the same problem.

Would you be interested in testing and submitting a similar fix there as well?

jdoss · 2022-11-15T21:40:05Z

I can push a PR up that should address the issue in the Docker driver, but I don't run Docker on my systems anymore so testing it would be an issue. I am also pretty backlogged right now so if you beat me to getting it patched that works too.

We can close this issue #65 too since its was the original issue and its PR was superseded by this one.

lgfa29 · 2022-11-15T22:36:19Z

No worries, I will see if I can repro the problem in Docker and submit a fix, should be straightforward with all the information you provided us 👍

Fix logic on volume SELinux labels.

e660c58

jdoss requested review from jrasell, lgfa29 and towe75 as code owners October 20, 2022 19:41

jdoss mentioned this pull request Oct 20, 2022

Moves SELinux label assignment to affect only implicit mounts #66

Closed

lgfa29 suggested changes Nov 15, 2022

View reviewed changes

lgfa29 approved these changes Nov 15, 2022

View reviewed changes

lgfa29 added 2 commits November 15, 2022 15:06

Merge branch 'main' into jdoss/SELinux_Privileged_mounts_fix

9cf0bc4

chagelog: add entry for hashicorp#196

1d65e47

lgfa29 merged commit 2a798c6 into hashicorp:main Nov 15, 2022

lgfa29 mentioned this pull request Nov 15, 2022

SELinux labeled volumes on shared filesystem #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix logic on volume SELinux labels #196

Fix logic on volume SELinux labels #196

jdoss commented Oct 20, 2022

jdoss commented Oct 20, 2022

jdoss commented Oct 21, 2022 •

edited

Loading

tgross commented Oct 21, 2022

lgfa29 left a comment

jdoss commented Nov 15, 2022

lgfa29 left a comment

jdoss commented Nov 15, 2022

lgfa29 commented Nov 15, 2022

Fix logic on volume SELinux labels #196

Fix logic on volume SELinux labels #196

Conversation

jdoss commented Oct 20, 2022

jdoss commented Oct 20, 2022

jdoss commented Oct 21, 2022 • edited Loading

tgross commented Oct 21, 2022

lgfa29 left a comment

Choose a reason for hiding this comment

jdoss commented Nov 15, 2022

lgfa29 left a comment

Choose a reason for hiding this comment

jdoss commented Nov 15, 2022

lgfa29 commented Nov 15, 2022

jdoss commented Oct 21, 2022 •

edited

Loading