Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

userns: prevent /sys/kernel/* paths in the container #2899

Conversation

giuseppe
Copy link
Member

when we run in a user namespace, there are cases where we have not
enough privileges to mount a fresh sysfs on /sys. To circumvent this
limitation, we rbind /sys from the host. This carries inside of the
container also some mounts we probably don't want to. We are also
limited by the kernel to use rbind instead of bind, as allowing a bind
would uncover paths that were not previously visible.

This is a slimmed down version of the intermediate mount namespace
logic we had before, where we only set /sys to slave, so the umounts
done to the storage by the cleanup process are propagated back to the
host. We also don't setup any new directory, so there is no
additional cleanup to do.

Signed-off-by: Giuseppe Scrivano [email protected]

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 11, 2019
@giuseppe giuseppe force-pushed the prevent-sys-fs-kernel-paths-in-userns branch from 3d0b90a to 507c640 Compare April 11, 2019 10:02
return errors.Wrapf(err, "cannot make /sys slave")
}

paths := []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this more future proof. Remove all submounts of /sys? Or at least anything begining with /sys/kernel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now

@@ -31,4 +31,12 @@ echo $rand | 0 | $rand
done < <(parse_table "$tests")
}

@test "podman run - uidmapping has no /sys/kernel mounts" {
run_podman $expected_rc run --uidmapping 0:100:10000 $IMAGE mount | grep /sys/kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need --net=host here to make the current behaviour happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added another test with --net=host, so that we can test both cases

@giuseppe giuseppe force-pushed the prevent-sys-fs-kernel-paths-in-userns branch from 507c640 to 0a5df19 Compare April 11, 2019 11:09
@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2019

LGTM
@TomSweeneyRedHat PTAL
@vrothberg @mheon @baude PTAL

@vrothberg
Copy link
Member

@giuseppe rootless tests aren't happy

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM (very elegant)

@giuseppe giuseppe force-pushed the prevent-sys-fs-kernel-paths-in-userns branch from 0a5df19 to 2ea8946 Compare April 11, 2019 12:25
@giuseppe
Copy link
Member Author

ah, this wouldn't work with rootless. An unprivileged user cannot umount these paths. @rhatdan is it an issue for buildah?

if err = unix.Unshare(unix.CLONE_NEWNS); err != nil {
return err
}
defer unix.Setns(int(fd.Fd()), unix.CLONE_NEWNS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this defer needed? Seems to be a duplicate of line 112

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the one above.

@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2019

@giuseppe Yes.
podman run --net=host fedora mount | grep /sys/kernel
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)

when we run in a user namespace, there are cases where we have not
enough privileges to mount a fresh sysfs on /sys.  To circumvent this
limitation, we rbind /sys from the host.  This carries inside of the
container also some mounts we probably don't want to.  We are also
limited by the kernel to use rbind instead of bind, as allowing a bind
would uncover paths that were not previously visible.

This is a slimmed down version of the intermediate mount namespace
logic we had before, where we only set /sys to slave, so the umounts
done to the storage by the cleanup process are propagated back to the
host.  We also don't setup any new directory, so there is no
additional cleanup to do.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the prevent-sys-fs-kernel-paths-in-userns branch from 2ea8946 to b780088 Compare April 11, 2019 13:40
@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2019

We might need to mask these in rootless mode. I am not sure if @TomSweeneyRedHat has tried buildah bud --isolation chroot inside of rootless podman yet.

We really need this for root running container anyways, especially where if you don't have SELinux running you can potentially modify the kernel.

In rootless mode this would obviously be blocked.

@giuseppe
Copy link
Member Author

@giuseppe Yes.
podman run --net=host fedora mount | grep /sys/kernel
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)

I think there is no way for an unprivileged user to drop that mounts. It cannot mount a new sysfs and it cannot umount anything that will reveal what is under the mount point. On the other hand, is it really a security issue? The unprivileged user won't have any new privilege that it didn't have on the host.

@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2019

Right I am worried about information reveal though. The information in /sys/kernel/debug might be useful to a hacked container. If we can not umount we should mount a tmpfs over them.

@giuseppe
Copy link
Member Author

Right I am worried about information reveal though. The information in /sys/kernel/debug might be useful to a hacked container. If we can not umount we should mount a tmpfs over them.

we can mask it, altough /sys/kernel/debug has mode 0700, an unprivileged user cannot access it

@giuseppe giuseppe force-pushed the prevent-sys-fs-kernel-paths-in-userns branch from 1b60e4b to 1367c65 Compare April 11, 2019 13:55
@giuseppe giuseppe force-pushed the prevent-sys-fs-kernel-paths-in-userns branch from 1367c65 to 2c9c40d Compare April 11, 2019 13:55
@giuseppe
Copy link
Member Author

Right I am worried about information reveal though. The information in /sys/kernel/debug might be useful to a hacked container. If we can not umount we should mount a tmpfs over them.

I've added a patch to mask /sys/kernel if we are bind mounting /sys and we are rootless

@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2019

Well this looks good.
find /sys/kernel/ | wc
find: '/sys/kernel/security': Permission denied
find: '/sys/kernel/debug': Permission denied
find: '/sys/kernel/config': Permission denied
5615 5615 213373
So I think for rootless we can ignore the mounts.

@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2019

LGTM

@giuseppe
Copy link
Member Author

Well this looks good.
find /sys/kernel/ | wc
find: '/sys/kernel/security': Permission denied
find: '/sys/kernel/debug': Permission denied
find: '/sys/kernel/config': Permission denied
5615 5615 213373
So I think for rootless we can ignore the mounts.

should I drop the second patch or is it still a valid additional defense?

@mheon
Copy link
Member

mheon commented Apr 11, 2019

LGTM. I think we keep the masked path - defense in depth is good.

@mheon
Copy link
Member

mheon commented Apr 11, 2019

Holding /lgtm until tests go green

@mheon
Copy link
Member

mheon commented Apr 11, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2019
@openshift-merge-robot openshift-merge-robot merged commit b281c34 into containers:master Apr 11, 2019
@edsantiago
Copy link
Member

This commit seems to have broken the new unbrivileged-access BATS test:

# bats test/system/400-unprivileged-access.bats
 ✗ podman container storage is not accessible by unprivileged users
   (in test file test/system/400-unprivileged-access.bats, line 74)
     `find $GRAPH_ROOT $RUN_ROOT \! -type l -perm -o+w -print | while read i; do' failed
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman rm --all --force
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
   quay.io/libpod/alpine_labels:latest   4fab981df737
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman run --name c_uidmap --uidmap 0:10000:10000 quay.io/libpod/alpine_labels:latest true
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman run --name c_uidmap_v --uidmap 0:10000:10000 -v foo:/foo quay.io/libpod/alpine_labels:latest true
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman run --name c_mount quay.io/libpod/alpine_labels:latest sh -c echo hi > /myfile;mkdir -p /mydir/mysubdir; chmod 777 /myfile /mydir /mydir/mysubdir
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman mount c_mount
   /var/lib/containers/storage/overlay/1440f9d7c7b5a8fc4bb1f4d699ef156960f89980663357b2481258ca7430b3d1/merged
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman info --format {{.store.GraphRoot}}
   /var/lib/containers/storage
   $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman info --format {{.store.RunRoot}}
   /var/run/containers/storage
   chmod: changing permissions of '/var/lib/containers/storage/overlay/71071cf29c423da6351954badf4aae660aa3ca1ebcdb8fedce8fda7c659b4599/merged': Operation not permitted
   chmod: changing permissions of '/var/lib/containers/storage/overlay/71071cf29c423da6351954badf4aae660aa3ca1ebcdb8fedce8fda7c659b4599/merged/tmp': Operation not permitted
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #| FAIL: Able to run 'ls /var/lib/containers/storage/overlay/71071cf29c423da6351954badf4aae660aa3ca1ebcdb8fedce8fda7c659b4599/merged/tmp' without error
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@giuseppe any suggestions on how to fix it

@giuseppe
Copy link
Member Author

This commit seems to have broken the new unbrivileged-access BATS test:

I think it is related to the new tests that are using a mapping not correct for the unprivileged access tests

@giuseppe
Copy link
Member Author

I am surprised how that could happen though, directories under /var/lib/containers/storage/overlay have mode drwx------ that blocks completely access to other users

@edsantiago
Copy link
Member

Not any more. Immediately after this command, /var/lib/containers/storage becomes 711:

# ./bin/podman run --name c_uidmap   --uidmap 0:10000:10000 alpine_labels true 

Furthermore, I am now stuck in a system where I can no longer remove that directory:

# rm -rf /var/lib/containers/storage
rm: cannot remove '/var/lib/containers/storage/overlay/0edc5e399bf04596f445dfaf7073d15699e2327dc45e1f62dca43638fc4095b8/merged': Device or resource busy

(podman rm -a, podman ps -a, podman images -a, all show nothing).

master @ c1e2b58. I don't have time this evening to do a full investigateion but wanted to leave you with something you can check out tomorrow morning.

@giuseppe
Copy link
Member Author

how do directories under /var/lib/containers/storage/overlay look like?

$ sudo ls -l /var/lib/containers/storage/overlay
total 16
drwx------. 5 root root  4096 Apr 15 23:39 3f72eff9ab5617186824e3a431489b5bb70fadc3004734b3c66008da3f3f0724
drwx------. 6 root root  4096 Apr 15 23:26 a464c54f93a9e88fc1d33df1e0e39cca427d60145a360962e8f19a1dbf900da9
drwx------. 7  100 users 4096 Apr 15 23:11 df64d3292fd6194b7865d7326af5255db6d81e9df29f48adde61a918fbd8c332
drwx------. 2 root root  4096 Apr 16 09:39 l

They must have mode drwx------.

@edsantiago
Copy link
Member

Sorry for the late reply. I can no longer reproduce this on my system, and OSP10 is down so I can't get a new virt for testing. The only difference on my system is that I'm still stuck with the undeletable directory mentioned above, so instead of rm -rf /var/lib/containers/storage I've done mv ...{,.DELETE-ME}. There is otherwise no difference: no reboot, no dnf-upgrade. I've tried everything I can to reproduce.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants