Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic on volume SELinux labels #196

Merged
merged 3 commits into from
Nov 15, 2022

Conversation

jdoss
Copy link
Contributor

@jdoss jdoss commented Oct 20, 2022

This will fix #184 by not applying SELinux contexts to privileged containers (CSI plugins run as privileged containers) and it will also allow for CSI plugins to correctly label SELinux contexts when mounting storage into a container when SELinux is set to enforcing.

This should close #66 as this PR would prevent SELinux contexts from being applied to CSI mounts and they would be unusable on hosts that have SELinux set to enforcing.

I have tested this with the https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver and I am able to read and write to volume mounts from this CSI plugin as expected:

# ls -lahZ /opt/nomad/data/client/csi/node/gcp-pd-csi/per-alloc/ec76605e-24c6-8fe4-1a84-a6d8e6a9a6ff/nginx-selinux-test/rw-file-system-single-node-writer/                          
total 20K
drwxr-xr-x. 3 root root system_u:object_r:container_file_t:s0 4.0K Oct 20 19:04 .
drwx------. 3 root root system_u:object_r:var_t:s0              47 Oct 20 19:24 ..
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0    0 Oct 20 19:04 bye.txt
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0    0 Oct 20 14:11 hello.txt
drwx------. 2 root root system_u:object_r:container_file_t:s0  16K Oct 20 11:31 lost+found

@jdoss
Copy link
Contributor Author

jdoss commented Oct 20, 2022

Fixes #65

@jdoss
Copy link
Contributor Author

jdoss commented Oct 21, 2022

I will note that I saw this error when trying to start a CSI controller:

Oct 21, '22 13:39:14 -0500 	Driver Failure 	rpc error: code = Unknown desc = failed to start task, could not start container: cannot start container, status code: 500: {"cause":"SELinux relabeling of /dev is not allowed","message":"SELinux relabeling of /dev is not allowed","response":500}

which I thought was odd because the csi disk documentation says that only the nodes need be running as privileged, but it seems that both node and controller plugins are mounting /dev/ as rw when being registered as CSI plugins.

Running both the controller and the node tasks as privileged = true allows everything to work correctly with SELinux enabled with this driver.

@tgross should CSI controllers need rw access to /dev? If not, that feels like a Nomad specific bug and I can open an issue on the Nomad repo. Since controllers are not doing the mounting, they shouldn't need access to /dev right?

Here is my GCP PD job file:

job "csi-gcp-pd-plugin" {
  datacenters = ["us-central1"]
  type = "system"

  group "controller" {
    task "gcp-controller-plugin" {
      driver = "podman"
      config {
        image = "k8s.gcr.io/cloud-provider-gcp/gcp-compute-persistent-disk-csi-driver:v1.7.2"
        args = [
          "--endpoint=unix:///csi/csi.sock",
          "--v=6",
          "--logtostderr",
          "--run-node-service=false"
        ]
        privileged = true
      }
      env {
        GOOGLE_APPLICATION_CREDENTIALS = "/secrets/gcp-pd-csi.json"
      }
      template {
        destination = "secrets/gcp-pd-csi.json"
        data        = <<-EOF
        {{ "my-base64-encoded-gcp-service-account-json" | base64Decode }}
        EOF
      }

      csi_plugin {
        id        = "gcp-pd-csi"
        type      = "controller"
        mount_dir = "/csi"
      }
      resources {
        memory = 256
      }
    }
  }
  group "node" {
    task "gcp-node-plugin" {
      driver = "podman"
      config {
        image = "k8s.gcr.io/cloud-provider-gcp/gcp-compute-persistent-disk-csi-driver:v1.7.2"

        args = [
          "--endpoint=unix:///csi/csi.sock",
          "--v=6",
          "--logtostderr",
          "--run-controller-service=false"
        ]
        privileged = true
      }
      env {
          GOOGLE_APPLICATION_CREDENTIALS = "/secrets/gcp-pd-csi.json"
      }
      template {
          destination = "secrets/gcp-pd-csi.json"
          data        = <<-EOF
          {{ "my-base64-encoded-gcp-service-account-json" | base64Decode }}
          EOF
      }
      csi_plugin {
        id        = "gcp-pd-csi"
        type      = "node"
        mount_dir = "/csi"
      }
      resources {
        memory = 256
      }

    }

  }
}

@tgross
Copy link
Member

tgross commented Oct 21, 2022

@tgross should CSI controllers need rw access to /dev? If not, that feels like a Nomad specific bug and I can open an issue on the Nomad repo. Since controllers are not doing the mounting, they shouldn't need access to /dev right?

The specification is ambiguous on what "controller publish" actually means and leaves it up to the plugin, so yeah it's entirely possible we've hit a controller plugin that requires it in the case of selinux relabeling.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and all the context provided in the linked issues @jdoss!

I'm not well versed in SELinux, so apologies in advance for my ignorance and anything wrong I say 😅

From what I understood the root of the problem is that the hots path of volumes in file systems that support SELinux must be mounted with the z label so that the runtime can properly set their policy to allow the container to access it.

By using a driver-level config of volumes.selinux_labels we don't have enough fine grain control to disable this when necessary (such as when running CSI plugins, or using plugins that don't support SELinux). I did a quick research on how Kubernetes handle this, and they seem to hardcode which plugins support and which ones don't, which doesn't seem like a good approach for us.

Using the privileged flag to check doesn't also seem ideal, as it's just loosely correlated with the problem.

Another thing I noticed is that this logic is pretty close to the Docker driver, and so I think there's a broader issue to be solved here.

Indeed, hashicorp/nomad#9123 points to a similar problem as you described in #185 and hashicorp/nomad#7094 has a similar changed as proposed in #66.

You made an interesting suggestion in #185:

Ideally, it would be cool to just add selinuxlabel = "z" to volume_mount blocks so we can handle SELinux labeling [...] on a per mount basis.

This could be one way forward. We have volume.mount_options so I was wondering if we could add a new field to store task driver specific flags, or something like that.

But coming back to the issue in hand, does the procedure you described in #184 work in general? Maybe we can document it for now until we come up with a better solution.

@jdoss
Copy link
Contributor Author

jdoss commented Nov 15, 2022

Hey @lgfa29 I agree that there is a better path forward w/r/t SELinux and volume mounts in general, but it will be more involved of a fix.

The issue is that some jobs need to be run as privileged containers and using selinuxlabel = "z" setting in the plugin section applies this context to every mount. This is going to be problematic because the --privileged flag adds mounts such as /dev that cannot be relabeled. Also, privileged containers don't need this label added because they run without the constraints that SELinux applies.

Executing container engines with the --privileged flag tells the engine to launch the container process without any further "security" lockdown.

https://www.redhat.com/sysadmin/privileged-flag-container-engines

I have been running this patch in production for weeks now and it works as expected. I hope we can get this merged so it unblocks the usage of CSI plugins with this driver until a better solution is implemented which would give us better control on how selinuxlabel is applied to volume mounts. I do think that if a container is run with --privileged it should be excluded anyways because it doesn't need anything from selinuxlabel. It already has it all the privileges it needs from the --privileged flag.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the detailed response @jdoss, I definitely learned something new today 😄

Now I understand the problem and this change looks good 👍

There's still some broader discussion we need to have around CSI and volume handling, but that's outside the scope of this change, which threw me off a bit while reviewing it.

As I mentioned earlier, the implementation here is very similar to our Docker driver, and so I imagine it suffers from the same problem.

Would you be interested in testing and submitting a similar fix there as well?

@lgfa29 lgfa29 merged commit 2a798c6 into hashicorp:main Nov 15, 2022
@jdoss
Copy link
Contributor Author

jdoss commented Nov 15, 2022

I can push a PR up that should address the issue in the Docker driver, but I don't run Docker on my systems anymore so testing it would be an issue. I am also pretty backlogged right now so if you beat me to getting it patched that works too.

We can close this issue #65 too since its was the original issue and its PR was superseded by this one.

@lgfa29
Copy link
Contributor

lgfa29 commented Nov 15, 2022

No worries, I will see if I can repro the problem in Docker and submit a fix, should be straightforward with all the information you provided us 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting selinuxlabel="z" in the client driver config breaks CSI hostpath plugin job
3 participants