Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not mount sysfs as rootless in more cases #8561

Merged
merged 1 commit into from
Dec 7, 2020

Conversation

mheon
Copy link
Member

@mheon mheon commented Dec 2, 2020

We can't mount sysfs as rootless unless we manage the network namespace. Problem: slirp4netns is now creating and managing a network namespace separate from the OCI runtime, so we can't mount sysfs in many circumstances. The crun OCI runtime will automatically handle this by falling back to a bind mount, but runc will not, so we didn't notice until RHEL gating tests ran on the new branch.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2020
@edsantiago
Copy link
Member

Just curious: don't the Ubuntu CI VMs use runc? I'm wondering how/if we could have caught this earlier, and how we can test that this will work in RHEL.

@mheon
Copy link
Member Author

mheon commented Dec 2, 2020

@giuseppe PTAL

@edsantiago That is an excellent question. I think they do. Possibly a newer version than RHEL runc, that has its own workaround for this?

@mheon
Copy link
Member Author

mheon commented Dec 2, 2020

Memory tests are definitely broken, they're looking at things in /sys/fs/cgroup...

@@ -165,7 +165,7 @@ func SpecGenToOCI(ctx context.Context, s *specgen.SpecGenerator, rt *libpod.Runt
inUserNS = true
}
}
if inUserNS && s.NetNS.IsHost() {
if inUserNS && s.NetNS.NSMode != specgen.NoNetwork {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it work when the NSMode is set to private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it doesn't because we still make the namespace in Podman in that case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check again to verify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are entirely correct, it does work.

@mheon
Copy link
Member Author

mheon commented Dec 3, 2020

Re-pushed with fix.

@mvollmer
Copy link

mvollmer commented Dec 3, 2020

Thanks for this fix! We ran into this while testing cockpit-podman on our rhel-8-4 image, which should be representative for a generic RHEL 8.4 installation.

And now I am confused about runc vs crun. What should be in use in a bone stock RHEL 8 installation, runc or crun? In other words, would a normal RHEL 8 podman user run into this bug, or not?

I ask because I am worried that we are unknowingly testing some less interesting configuration of RHEL 8.

Thanks!

@mheon
Copy link
Member Author

mheon commented Dec 3, 2020

The default is runc - crun only became available with 8.3.0, AFAIK, and there's no way we're making it default before 9.0

@mvollmer
Copy link

mvollmer commented Dec 4, 2020

The default is runc - crun only became available with 8.3.0, AFAIK, and there's no way we're making it default before 9.0

Thanks, so we are doing it right!

@@ -165,7 +165,7 @@ func SpecGenToOCI(ctx context.Context, s *specgen.SpecGenerator, rt *libpod.Runt
inUserNS = true
}
}
if inUserNS && s.NetNS.IsHost() {
if inUserNS && !(s.NetNS.NSMode == specgen.NoNetwork || s.NetNS.NSMode == specgen.Private) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is correct? And I think we should be using the functions in namespaces? I am not sure if all cases are handled.

NSMode == "default"
NSMode == ""
are the same as
NSMode == "private"

Copy link
Member Author

@mheon mheon Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're not. We explicitly do not want to trigger this with slirp4netns networking, and both of those can trigger slirp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If private also triggers slirp, then we'll need to explicitly disable it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to revert back to NoNetwork. It's guaranteed to be safe. We can loosen this later if we need to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to have a check added to indicate whether or not we are using slirp4netns networking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure if CNI + user namespaces is safe either. It's probably better to leave disabled.

@mheon
Copy link
Member Author

mheon commented Dec 4, 2020

Re-pushed with fixes for tests - I disabled networking on a lot of tests that accessed /sys/fs/cgroup as cgroupsv2 to bypass the sysfs problem.

We can't mount sysfs as rootless unless we manage the network
namespace. Problem: slirp4netns is now creating and managing a
network namespace separate from the OCI runtime, so we can't
mount sysfs in many circumstances. The `crun` OCI runtime will
automatically handle this by falling back to a bind mount, but
`runc` will not, so we didn't notice until RHEL gating tests ran
on the new branch.

Signed-off-by: Matthew Heon <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Dec 5, 2020

@giuseppe PTAL

@mheon
Copy link
Member Author

mheon commented Dec 7, 2020

@containers/podman-maintainers PTAL, this is needed for 2.2.1

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mheon,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@giuseppe
Copy link
Member

giuseppe commented Dec 7, 2020

/lgtm

@giuseppe
Copy link
Member

this PR caused a regression as reported here: systemd/systemd#17902 (comment)

podman run  --rm fedora grep cgroup /proc/self/mountinfo
1893 1891 0:27 /../../../../../.. /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw,seclabel,nsdelegate

Do you remember what was the failure we saw without this commit?

@giuseppe
Copy link
Member

@mvollmer do you have a reproducer for the issue you were seeing?

@mheon
Copy link
Member Author

mheon commented Jan 12, 2021

@giuseppe Was causing failures to run rootless Podman on cgroupsv1 + runc (most notably on the RHEL8 test suite, but Ubuntu and other v1 distros were also affected).

@giuseppe
Copy link
Member

I've opened a PR to see if we catch any of these failures in the CI: #8949

@jonasbb
Copy link

jonasbb commented Jan 14, 2021

@giuseppe Would #8949 also help with #8712? #8712 has steps to reproduce.

@giuseppe
Copy link
Member

@giuseppe Would #8949 also help with #8712? #8712 has steps to reproduce.

yes

@jonasbb
Copy link

jonasbb commented Jan 14, 2021

Glad to hear. Then I hope this PR or a similar one can be merged for the next release.

edsantiago added a commit to edsantiago/libpod that referenced this pull request Feb 2, 2021
RHEL8 rootless gating tests are inconsistently failing with:

   $ podman diff --format json -l
   #
   {"changed":["/etc"],"added":["/sys/fs","/sys/fs/cgroup","/pMOm1Q0fnN"],"deleted":["/etc/services"]}
   # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   # #|     FAIL: added
   # #| expected: '/pMOm1Q0fnN'
   # #|   actual: '/sys/fs'
   # #|         > '/sys/fs/cgroup'
   # #|         > '/pMOm1Q0fnN'
   # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Reason: PR containers#8561, I think (something to do with /sys on RHEL).

Workaround: ignore '/sys/fs' in diffs.

Signed-off-by: Ed Santiago <[email protected]>
edsantiago added a commit to edsantiago/libpod that referenced this pull request Feb 3, 2021
RHEL8 rootless gating tests are inconsistently failing with:

   $ podman diff --format json -l
   #
   {"changed":["/etc"],"added":["/sys/fs","/sys/fs/cgroup","/pMOm1Q0fnN"],"deleted":["/etc/services"]}
   # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   # #|     FAIL: added
   # #| expected: '/pMOm1Q0fnN'
   # #|   actual: '/sys/fs'
   # #|         > '/sys/fs/cgroup'
   # #|         > '/pMOm1Q0fnN'
   # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Reason: PR containers#8561, I think (something to do with /sys on RHEL).

Workaround: ignore '/sys/fs' in diffs.

Signed-off-by: Ed Santiago <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants