Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document kubeadm usage with SELinux #279

Closed
luxas opened this issue May 29, 2017 · 77 comments
Closed

Document kubeadm usage with SELinux #279

luxas opened this issue May 29, 2017 · 77 comments
Labels
area/ecosystem help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@luxas
Copy link
Member

luxas commented May 29, 2017

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

COMMUNITY REQUEST

Versions

All

We need e2e tests that ensure kubeadm works with SELinux on CentOS/Fedora (#215) and CoreOS (#269)

We might be able to add a job for it on kubernetes-anywhere @pipejakob ?

IIUC kubeadm is broken with SELinux enabled right now. The problem is that we don't have one (AFAIK) very experienced with SELinux in the kubeadm team (at least nobody has had time to look into it yet)

AFAIK, the problem is often when mounting hostPath volumes...

To get closer to production readiness, we should fix this and add a testing suite for it.
We should also work with CNI network providers to make sure they adopt the right SELinux policies as well.

Anyone want to take ownership here? I'm not very experienced with SELinux, so I'm probably gonna focus on other things.

@dgoodwin @aaronlevy @coeki @rhatdan @philips @bboreham @mikedanese @pipejakob

@luxas luxas added area/ecosystem area/releasing area/test kind/enhancement kind/postmortem priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels May 29, 2017
@luxas
Copy link
Member Author

luxas commented May 29, 2017

@pipejakob I added the kind/postmortem label as it's in the same theme, we broke SELinux users again without noticing it...

@rhatdan
Copy link

rhatdan commented May 30, 2017

I don't work with kubadmin but would be very willing to help whoever takes this on.

@luxas
Copy link
Member Author

luxas commented May 30, 2017

@rhatdan Great! What I'm looking for is persons that are familiar with SELinux and willing to help.
I might be able to coordinate the work though.

A rough todo list would look like:

  • Make kubeadm work with SELinux enabled in v1.7
  • Make an e2e suite of CentOS/Fedora nodes that will notify us if there is a regression.
  • Look into the CoreOS issue and how the SELinux setup between CentOS and CoreOS differs.

@rhatdan Let's first try and get it working in v1.7, can be done in #215

@roberthbailey
Copy link
Contributor

@timothysc

@coeki
Copy link

coeki commented May 31, 2017

I will take for now, since I raised it. I'll have some updates soon, @rhatdan, please advise me ;)

@timothysc
Copy link
Member

@luxas , @jasonbrooks - does this still exist in fedora?

I think folks have patched policies on other channels.

/cc @eparis

@jasonbrooks
Copy link

@timothysc I haven't tried w/ 1.7 yet, but w/ 1.6, CentOS worked w/ selinux but Fedora 25 didn't. I'll test w/ 1.7

@jasonbrooks
Copy link

for reference, I just ran kubeadm 1.7 on f26 in permissive mode, and these are the denials I got:

[root@fedora-1 ~]# ausearch -m avc -ts recent
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:321): avc:  denied  { read } for  pid=2885 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:322): avc:  denied  { open } for  pid=2885 comm="kube-apiserver" path="/etc/kubernetes/pki/apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:331): avc:  denied  { read } for  pid=2945 comm="kube-controller" name="sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:332): avc:  denied  { open } for  pid=2945 comm="kube-controller" path="/etc/kubernetes/pki/sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1

On CentOS 7, same thing, no denials.

@rhatdan
Copy link

rhatdan commented Jul 11, 2017

You are volume mounting in content from the host into a container. If you want an SELinux confined process inside the container to be able to read the content, it has to have an SELinux label that the container is allowed to read.

Mounting the object with :Z or :z would fix the issue. Note either of these would allow the container to write these objects. If you want to allow the container to read without writing then you could change the content on the host to something like container_share_t.

@luxas
Copy link
Member Author

luxas commented Jul 12, 2017

kubernetes/kubernetes#48607 will also help here as it starts making mounting everything but etcd read-only...

@timothysc
Copy link
Member

@luxas @jasonbrooks - someone want to tinker with adjusting the manifests ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ) ?

@luxas
Copy link
Member Author

luxas commented Jul 12, 2017 via email

@jasonbrooks
Copy link

@rhatdan It looks like :Z is only used if the pod provides an selinux label. In my initial tests, container_runtime_t seems to work -- would that be an appropriate label? And then, I'm assuming in a system w/o selinux, this would just be ignored?

@rhatdan
Copy link

rhatdan commented Jul 12, 2017

Yes it will be ignored by non SELinux systems. RUnning an app as container_runtime_t, basically provides no SELinux confinement, since it is supposed to be the label of container runtimes like docker and CRI-O. If you are running the kublet as this, that is probably fairly accurate.

@jasonbrooks
Copy link

Right now, we're running the etcd container as spc_t -- would it be better to run that one as container_runtime_t too?

@jasonbrooks
Copy link

It looks like this does it:

diff --git a/cmd/kubeadm/app/master/manifests.go b/cmd/kubeadm/app/master/manifests.go
index 55fe560c46..228f935cdd 100644
--- a/cmd/kubeadm/app/master/manifests.go
+++ b/cmd/kubeadm/app/master/manifests.go
@@ -96,6 +96,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
                        LivenessProbe: componentProbe(int(cfg.API.BindPort), "/healthz", api.URISchemeHTTPS),
                        Resources:     componentResources("250m"),
                        Env:           getProxyEnvVars(),
+                        SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
                }, volumes...),
                kubeControllerManager: componentPod(api.Container{
                        Name:          kubeControllerManager,
@@ -105,6 +106,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
                        LivenessProbe: componentProbe(10252, "/healthz", api.URISchemeHTTP),
                        Resources:     componentResources("200m"),
                        Env:           getProxyEnvVars(),
+                        SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
                }, volumes...),
                kubeScheduler: componentPod(api.Container{
                        Name:          kubeScheduler,

Would this be something to submit as PRs to the 1.7 branch and to master, or just to master? The source moved around a bit in master, the patch above is to the 1.7 branch.

@rhatdan
Copy link

rhatdan commented Jul 13, 2017

I would actually prefer that it run as spc_t, or as a confined domain(container_t). etcd should be easily be able to be confined by SELinux.

@jasonbrooks
Copy link

I think spc_t should work. I tried w/ container_t and that didn't work. audit2allow says it needs:

allow container_t cert_t:file { open read };

@rhatdan
Copy link

rhatdan commented Jul 13, 2017

Could we relabel the certs directory with container_file_t or container_share_t. then it would work.

@jasonbrooks
Copy link

kubeadm creates an /etc/kubernetes/pki dir when you run kubeadm init, but when you kubeadm reset, it only empties that dir. If we created the pki dir when the rpm is installed, we could do the labeling at that point, by modding the spec file.

@jasonbrooks
Copy link

For etcd, the container would need allow container_t container_var_lib_t:file { create lock open read unlink write }; for /var/lib/etcd on the host.

@jasonbrooks
Copy link

I'm trying to figure out if it's legitimate to chcon directories in the rpm spec file -- I see many instances of it in github (https://github.com/search?l=&p=1&q=chcon+extension%3Aspec) but I can't tell whether that's considered good packaging practice or not. We could either change kubeam to run the components as spc_t, unconfined, or we could leave kubeadm alone and chcon the pki dir.

@randomvariable
Copy link
Member

100% agreed.

I'm tempted to say we hold off until CentOS 8 goes GA so we have a consistent baseline wrt to the kernel version across distros.

@neolit123
Copy link
Member

If we don't handle the testing situation properly, we might end up in a "works today, but might not work tomorrow" situation and angry users over claimed SELinux support in our documentation.

if selinux is something that strictly requires e2e tests, we cannot support it today.

problem is that the ecosystem has some many distros and flavors that we cannot test them all.
if we don't want to maintain code for selinux in kubeadm, we can still have a guide in our setup with a disclaimer "may not work" and this was my initial proposal.

@randomvariable
Copy link
Member

randomvariable commented Jul 5, 2019

If we sort out #1379, this would take us a long way towards enabling users who want to have a stricter SELinux setup.

I will gather instructions on how you can do SELinux in an unsupported fashion today - either as a blogpost or a doc that can go on docs.k8s.io.

Additionally, @TheFoxAtWork did raise SELinux at CNCF SIG Security this week, so wondering if there's broader interest in getting this working.

@randomvariable
Copy link
Member

problem is that the ecosystem has some many distros and flavors that we cannot test them all.

One thing that helps here is that the container-selinux package is shared across all of the distros, so maybe ok to add only one. Suggestion is to add CentOS 8 because it'll be on Linux 4.18, which is slightly ahead of Ubuntu 18.04 but not massive enough to start exhibiting other bugs. Amazon Linux 2 is also an option if we're testing on it elsewhere.

@neolit123
Copy link
Member

neolit123 commented Jul 5, 2019

problem is that the ecosystem has some many distros and flavors that we cannot test them all.

One thing that helps here is that the container-selinux package is shared across all of the distros, so maybe ok to add only one. Suggestion is to add CentOS 8 because it'll be on Linux 4.18, which is slightly ahead of Ubuntu 18.04 but not massive enough to start exhibiting other bugs. Amazon Linux 2 is also an option if we're testing on it elsewhere.

i guess my point was more about the fact that selinux is not something that is really supported on Ubuntu and AppArmor is the alternative for it.

https://security.stackexchange.com/a/141716

Now practically SElinux works better with Fedora and RHEL as it comes preshipped while AA works better on Ubuntu and SUSE which means it would be better to learn how to use SElinux on the former distros than going through the hassel of making AA work on them and vice versa.

this is the distro flavor mess that i don't want to get kubeadm into.

the kubeadm survey told us that 65% of our users use Ubuntu, so technically we should be prioritizing apparmor. has anyone tried kubeadm with apparmor?
xref https://kubernetes.io/docs/tutorials/clusters/apparmor/

If we sort out #1379, this would take us a long way towards enabling users who want to have a stricter SELinux setup.

yes, it feels to me we should just document some basic details and punt the rest to #1379.
but i'm also seeing demand for static pod configuration enhancements in the next v1betaX, because RN we cannot persist securitycontext modifications after upgrade.

@randomvariable
Copy link
Member

randomvariable commented Jul 5, 2019

has anyone tried kubeadm with apparmor?

AppArmor is much more lightweight than SELinux and has a different security model. It's pretty much on by default these days for Docker on Ubuntu.

this is the distro flavor mess that i don't want to get kubeadm into.

Given AppArmor can't be used on CentOS and equivalents (other than AL2 which supports both), we're already there in saying some percentage of users can't make use of a Linux Security Module in a supported fashion.

@rcythr
Copy link

rcythr commented Jul 5, 2019

this is the distro flavor mess that i don't want to get kubeadm into.

I definitely understand the desire to not get into the mess of distros and options -- it's a combinatorial explosion of test configurations. Personally, I believe if kubeadm was at least compatible with selinux it would have a larger share of non-ubuntu users, but I have no proof of that beyond the fact I'm one of those people. However, If the only distro/cri combination that's tested is ubuntu with docker, then that's really the only supported distro/cri.

If you don't want to support other configurations that's your choice, but at least be clear about that in the documentation and close this issue now. Telling centos/rhel/fedora users to disable selinux for their entire system because figuring out (and testing) the policy for an application is annoying is the equivalent to telling them to disable their firewall because figuring out (and testing) the rules is annoying.

@neolit123
Copy link
Member

@randomvariable

AppArmor is much more lightweight than SELinux and has a different security model. It's pretty much on by default these days for Docker on Ubuntu.

actually, i think it's already running in the prow/kubekins image.

@rcythr

If the only distro/cri combination that's tested is ubuntu with docker, then that's really the only supported distro/cri.

currently we are testing containerd and docker on Ubuntu.

Telling centos/rhel/fedora users to disable selinux for their entire system because figuring out (and testing) the policy for an application is annoying is the equivalent to telling them to disable their firewall because figuring out (and testing) the rules is annoying.

we need help with the selinux details. we already tell "CentOS, RHEL or Fedora" users to disable selinux completely:
https://github.com/kubernetes/website/blob/master/content/en/docs/setup/production-environment/tools/kubeadm/install-kubeadm.md#installing-kubeadm-kubelet-and-kubectl

this isn't desired and i still think we should have a document or a paragraph with some guiding steps.

@randomvariable
Copy link
Member

Filed containerd/cri#1195 to log the fact that there's still work to be done to complete SELinux support in ContainerD.

@yann-soubeyrand
Copy link

  1. I want to look into incorporating some of the changes by @randomvariable to more
    tightly confine some of the containers, where possible. I believe his change will allow us
    to tightly confine kube-scheduler and etcd; however, because we cannot relabel
    /etc/ssl/certs or /etc/pki without breaking the system, we cannot confine kube-apiserver
    or kube-controller-manager tighter than spc_t. We either need to create a custom type
    for them, stop using these two system directories, or just live with the spc_t type.

Not using system directories or creating custom types for components needing access to these system directories would allow us to further tighten the various components and avoid using spc_t type completely. This is clearly the best solution IMHO.

@rcythr
Copy link

rcythr commented Jul 6, 2019

  1. I want to look into incorporating some of the changes by @randomvariable to more
    tightly confine some of the containers, where possible. I believe his change will allow us
    to tightly confine kube-scheduler and etcd; however, because we cannot relabel
    /etc/ssl/certs or /etc/pki without breaking the system, we cannot confine kube-apiserver
    or kube-controller-manager tighter than spc_t. We either need to create a custom type
    for them, stop using these two system directories, or just live with the spc_t type.

Not using system directories or creating custom types for components needing access to these system directories would allow us to further tighten the various components and avoid using spc_t type completely. This is clearly the best solution IMHO.

I agree. That's why I wanted to test it out and get it working.

Based on feedback from @neolit123, I doubt we'll see built-in changes to the code to automatically handle selinux compatibility anytime soon. Instead I'm going to make a doc page that describes how to use kubeadm on systems with selinux. It'll be a few steps longer than the usual kubeadm init process, but it should help anyone who wants to use kubeadm on selinux systems immediately.

On that page I'll present three options:

  1. Disable selinux: This is the least confined option, but the most supported. It may be necessary for some CNI plugins or user workloads.
  2. Keep using system directories /etc/pki and /etc/ssl/certs and use spt_t to avoid problems.
  3. Custom directories, and chcon relabeling and pretty tight confinement.

@qpehedm
Copy link

qpehedm commented Jul 10, 2019

Hi, just wanted to comment that we have been running with selinux enabled for almost a year together with kubeadm and kubernetes+docker+calico on Centos7. Most workloads has no issues, had some issues with concourse only that i can recall.

However we would like to do better enforcement though at some point, currently we run with
'chcon -Rt container_file_t' on below directories and make sure they are created before running kubeadm init (including /etc/kubernetes/pki/etcd)

/var/lib/etcd (etcd datadir)
/etc/kubernetes/pki (`certificatesDir' in kubeadm.conf)
/etc/cni/net.d
/opt/cni/bin/

Copy link
Member

@qpehedm If you could apply a JSONPatch to the static pod manifests, and also make sure kubeadm runs restorecon for each file/directory it writes, would that be sufficient to allow you to set stricter confinement?

@qpehedm
Copy link

qpehedm commented Jul 11, 2019

@randomvariable Yes to add securitycontext with the spc_t for the k8s static pods as suggested is likely better than our current solution. I suppose the same needs to be done for CNI plugins or other infrastructure containers. Even better would be dedicated labels for this purpose, so that they only get permissions for the specific files needed?

@neolit123 neolit123 added the kind/documentation Categorizes issue or PR as related to documentation. label Oct 13, 2019
@neolit123 neolit123 changed the title Decide owner for making sure kubeadm works with SELinux Document kubeadm usage with SELinux Oct 13, 2019
@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed area/releasing area/test kind/feature Categorizes issue or PR as related to a new feature. kind/postmortem priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Oct 13, 2019
@neolit123
Copy link
Member

neolit123 commented Jan 20, 2020

i'm going to close this ticket and here is my rationale, explained with bullet points:

/close

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

i'm going to close this ticket and here is my rationale, explained with bullet points:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ecosystem help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests