Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document running nested in docker/podman #284

Open
cgwalters opened this issue Aug 7, 2018 · 7 comments
Open

Document running nested in docker/podman #284

cgwalters opened this issue Aug 7, 2018 · 7 comments

Comments

@cgwalters
Copy link
Collaborator

# podman run --rm -ti --security-opt seccomp=unconfined quay.io/cgwalters/coreos-assembler bwrap --unshare-pid --unshare-user --bind / / true
bwrap: Failed to mount tmpfs: Permission denied

This is actually SELinux. See this issue.

Now, this will work:

podman run --rm -ti --security-opt label=type:spc_t --security-opt seccomp=unconfined quay.io/cgwalters/coreos-assembler bwrap --unshare-pid --unshare-user --bind / / true

Note if one wants to pass through devices (e.g. --device /dev/kvm on the docker/podman side) you'll also want --dev-bind /dev /dev.

Now the problem I'm hitting is around /proc. Which if one is using --unshare-pid, you really need to do, or all of the PIDs are wrong in /proc and things will get confused.

Adding --proc /proc gets me:
bwrap: Can't mount proc on /newroot/proc: Operation not permitted

Which...I'm confused by this right now; why doesn't that work? It looks like our test suite does --bind /proc proc but that gets me the same issue with incorrect pids.

@cgwalters
Copy link
Collaborator Author

/cc @giuseppe

@giuseppe
Copy link
Member

giuseppe commented Aug 8, 2018

the issue is that /proc in the container has masked/readonly paths, that prevents an user namespace to mount a too "revealing" procfs.
The solution is either to not have these masked/readonly paths or avoid creating a new PID namespace in the container so that a bind mount works fine. Docker has recently added a way to modify the list of masked/readonly paths (at least in the API, not sure about the CLI), but for Podman I think --privileged the only way to skip adding these paths.

@cgwalters
Copy link
Collaborator Author

the issue is that /proc in the container has masked/readonly paths,

Ahh, right. Ugh. It feels like what we need is a "procfs-nolegacy" filesystem type or something that strips out all of the /proc/asound, /proc/bus, /proc/sysrq-trigger etc.

@cgwalters
Copy link
Collaborator Author

Alternatively...audit everything in /proc and verify that it requires CAP_SYS_ADMIN to write. If the container then doesn't have CAP_SYS_ADMIN, we don't need the ro overmounts.

Or yet another approach: An option for proc like ro-rw-pidfs that defaults every file to mode 0600 except what I'd call "pidfs", i.e. the process bits in /proc/$pid. And the semantics for this should include that CAP_DAC_OVERRIDE does not allow overriding permissions.

@cgwalters
Copy link
Collaborator Author

Ah right: https://lkml.org/lkml/2018/5/11/155

@rhatdan
Copy link
Member

rhatdan commented Aug 8, 2018

I have added container_userns_t which allows some of the access that was denied by container_t. Like mounting of a tmpfs. I would love to know if this would work for your use case, and leave SELinux in enforcing container separation.

@cgwalters
Copy link
Collaborator Author

I would love to know if this would work for your use case, and leave SELinux in enforcing container separation.

Probably, I'll try it at some point, but the real blocker here is the /proc issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants