mount sysfs in namespace #3831

gabibeyer · 2019-08-15T23:43:12Z

mount sysfs so the process running in the container
can see the correct /sys/class/net entries

Signed-off-by: Gabi Beyer [email protected]

openshift-ci-robot · 2019-08-15T23:43:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gabibeyer
To complete the pull request process, please assign jwhonce
You can assign the PR to them by writing /assign @jwhonce in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2019-08-15T23:43:32Z

Hi @gabibeyer. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rhatdan · 2019-08-16T10:57:18Z

/ok-to-test
Thanks @gabibeyer
@mheon @giuseppe PTAL

mheon · 2019-08-16T12:43:21Z

Hmm. If we just need this in the container, shouldn't we add the mount in the OCI spec?

…

On Fri, Aug 16, 2019, 06:57 Daniel J Walsh ***@***.***> wrote: /ok-to-test Thanks @gabibeyer <https://github.com/gabibeyer> @mheon <https://github.com/mheon> @giuseppe <https://github.com/giuseppe> PTAL — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3831>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB3AOCB5GUN5PEUYH5QV5Q3QE2BZFANCNFSM4IMCX4XQ> .

amshinde · 2019-08-16T17:45:49Z

@mheon This is something that may be required for the OCI runtime itself, not just the container.
We see that this is currently handled with rootlesskit for rootless docker, would be nice if this is behaviour is implemented in podman as well.

giuseppe · 2019-08-19T06:23:09Z

pkg/netns/netns_linux.go

+
+		// mount sysfs so the process running in the container can see the
+		// correct /sys/class/net entries
+		err = unix.Mount("sysfs", "/sys", "sysfs", unix.MS_RDONLY, "")


The goroutine is running in a new network namespace but it is sharing the mount namespace with the host. How is the new mount visible from the container?

I may be very off in my understanding of namespaces, but by default doesn't podman create a user namespace that all containers share (unless explicitly stated). The network namespace is being created inside that user namespace, and since mount namespaces require an owner user namespace all that information would be accessible by the container if it enters the network namespace?

blah disregard that. I think what's happening is that it is being mounted correctly, but it needs to do a mount sooner, right after the network namespace creation in order to the /sys/class/net to be generated correctly for the oci runtime to use.

@gabibeyer Perhaps, the commit message needs to be rephrased a bit, to specify that the OCI runtime needs to access the /sys.
@giuseppe We need this behaviour for the runtime. In case of Kata, we need to inspect the network namespace to get the type of network interfaces created in order to connect them to the virtual machine.
Most of this information can be obtained from netlink API.
But we may need to get this from sysfs in some cases. In case of tap interfaces, we do not get the information whether a tuntap device is actually "tap" or "tun" on older kernels from netlink. In that case we need to fallback to the sysfs, and expect the sysfs to be remounted to reflect the correct /sys/class/net information corresponding to the new network namespace created by podman.

I hope that makes things a bit clearer. A related PR submitted to netlink:
vishvananda/netlink@db99c04

rh-atomic-bot · 2019-08-21T18:08:00Z

☔ The latest upstream changes (presumably #2940) made this pull request unmergeable. Please resolve the merge conflicts.

mount sysfs so the process running in the container can see the correct /sys/class/net entries Signed-off-by: Gabi Beyer <[email protected]>

giuseppe · 2019-10-16T10:40:14Z

@gabibeyer is it still needed?

gabibeyer · 2019-10-16T10:49:56Z

@giuseppe I believe so. @amshinde @marcov would you happen to know more?

giuseppe · 2019-10-16T11:21:10Z

do we have an example of issue we are trying to solve with this PR? Still it is not clear to me how mounting /sys again can make any difference. Don't we also leak a mount each time the code runs?

marcov · 2019-10-17T14:21:07Z

@amshinde explained here why access to sysfs is needed, #3831 (comment)

@gabibeyer: is this something specific for rootless?

amshinde · 2019-10-17T16:58:31Z

@giuseppe As @marcov mentioned, I have added an explanation in an earlier comment.
@marcov This isnt specific to rootless as such, its generally a good idea to have the sysfs remounted for a new network namespace, but the tap interfaces created in case of rootless makes this a requirement.

marcov · 2019-10-17T19:15:23Z

Got it @amshinde, then maybe you could consider only bind mounting the path necessary to look up the tap interfaces, and not the whole /sys

giuseppe · 2019-10-21T07:14:00Z

@giuseppe As @marcov mentioned, I have added an explanation in an earlier comment.

@amshinde thanks, I've read that comment but it is still not clear for me, isn't the mount namespace at this point the same as on the host? Isn't /sys already mounted there?

Is it needed for the rootless case?

amshinde · 2019-11-04T21:02:41Z

Let me see if I can explain this better.

I've read that comment but it is still not clear for me, isn't the mount namespace at this point the same as on the host? Isn't /sys already mounted there?

@giuseppe Yes, the mount namespace is the same on the host, /sys is already mounted there, but /sys is not tied to mount namespaces. The /sys on entering the new network namespace still points to the host network namespace.

A little background on the issue: https://lwn.net/Articles/295587/
Any sysfs directory can be associated with (at most) one type of tag, where that type identifies which type of namespace controls how that directory is viewed. Thus, for example, /sys/class/net would have a tag type identifying the network namespace subsystem as the one which is in control there.
When podman creates the network namespace, once Kata runtime enters the network namespace to inspect it, the sysfs entries in /sys/class/net still refer to the network interfaces in the host network namespace.

An issue that describes this in some more detail:
https://unix.stackexchange.com/questions/457025/switching-into-a-network-namespace-does-not-change-sys-class-net

I see that a similar project rootless-kit addressing this here:
https://github.com/rootless-containers/rootlesskit/blob/2cb5ec5819fddecb93aeea843ecdf29d6ecaf9af/pkg/child/child.go#L131

Let me know if you have any further questions, maybe we can discuss this on the crio call this week if you are on it.

amshinde · 2019-11-19T22:35:05Z

@giuseppe @mheon Any updates on this PR?

giuseppe · 2019-11-20T07:21:59Z

@giuseppe @mheon Any updates on this PR?

I've verified it locally and it creates a new RDONLY /sys mount on the host . I don't think that should happen.

If /sys is not already mounted, we should probably raise an error and ask the users to fix it.

giuseppe · 2020-01-08T14:33:20Z

@gabibeyer @amshinde is this patch still needed? If you'd like I can help here, can we discuss the issue you are having?

giuseppe · 2020-01-20T13:36:44Z

as there is no progress, I am closing the PR. Please re-open if it is still needed

openshift-ci-robot requested review from giuseppe and rhatdan August 15, 2019 23:43

openshift-ci-robot added size/XS needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 15, 2019

gabibeyer mentioned this pull request Aug 16, 2019

rootless: add rootless to kata kata-containers/runtime#1875

Merged

openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 16, 2019

gabibeyer force-pushed the mountSysfs branch from 3f2bafb to b4271c6 Compare August 16, 2019 21:03

openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS labels Aug 16, 2019

gabibeyer force-pushed the mountSysfs branch from b4271c6 to b73f7e6 Compare August 16, 2019 21:04

openshift-ci-robot added size/XS and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 16, 2019

giuseppe reviewed Aug 19, 2019

View reviewed changes

gabibeyer force-pushed the mountSysfs branch from b73f7e6 to 05d474f Compare August 19, 2019 21:25

openshift-ci-robot added size/L and removed size/XS labels Aug 19, 2019

gabibeyer force-pushed the mountSysfs branch from 05d474f to a2d50c9 Compare August 22, 2019 17:28

openshift-ci-robot added size/XS and removed size/L labels Aug 22, 2019

gabibeyer force-pushed the mountSysfs branch 2 times, most recently from 940a0b7 to 0beed21 Compare August 29, 2019 17:53

gabibeyer force-pushed the mountSysfs branch from 0beed21 to 412f14b Compare August 29, 2019 19:36

mount sysfs in namespace

a6cae4c

mount sysfs so the process running in the container can see the correct /sys/class/net entries Signed-off-by: Gabi Beyer <[email protected]>

gabibeyer force-pushed the mountSysfs branch from 412f14b to a6cae4c Compare August 29, 2019 21:42

amshinde mentioned this pull request Nov 19, 2019

Add podman docs kata-containers/documentation#565

Closed

giuseppe closed this Jan 20, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mount sysfs in namespace #3831

mount sysfs in namespace #3831

gabibeyer commented Aug 15, 2019

openshift-ci-robot commented Aug 15, 2019

openshift-ci-robot commented Aug 15, 2019

rhatdan commented Aug 16, 2019

mheon commented Aug 16, 2019 via email

amshinde commented Aug 16, 2019

giuseppe Aug 19, 2019

gabibeyer Aug 19, 2019

gabibeyer Aug 19, 2019

amshinde Aug 20, 2019

rh-atomic-bot commented Aug 21, 2019

giuseppe commented Oct 16, 2019

gabibeyer commented Oct 16, 2019

giuseppe commented Oct 16, 2019

marcov commented Oct 17, 2019

amshinde commented Oct 17, 2019

marcov commented Oct 17, 2019

giuseppe commented Oct 21, 2019

amshinde commented Nov 4, 2019

amshinde commented Nov 19, 2019

giuseppe commented Nov 20, 2019 •

edited

Loading

giuseppe commented Jan 8, 2020

giuseppe commented Jan 20, 2020

mount sysfs in namespace #3831

mount sysfs in namespace #3831

Conversation

gabibeyer commented Aug 15, 2019

openshift-ci-robot commented Aug 15, 2019

openshift-ci-robot commented Aug 15, 2019

rhatdan commented Aug 16, 2019

mheon commented Aug 16, 2019 via email

amshinde commented Aug 16, 2019

giuseppe Aug 19, 2019

Choose a reason for hiding this comment

gabibeyer Aug 19, 2019

Choose a reason for hiding this comment

gabibeyer Aug 19, 2019

Choose a reason for hiding this comment

amshinde Aug 20, 2019

Choose a reason for hiding this comment

rh-atomic-bot commented Aug 21, 2019

giuseppe commented Oct 16, 2019

gabibeyer commented Oct 16, 2019

giuseppe commented Oct 16, 2019

marcov commented Oct 17, 2019

amshinde commented Oct 17, 2019

marcov commented Oct 17, 2019

giuseppe commented Oct 21, 2019

amshinde commented Nov 4, 2019

amshinde commented Nov 19, 2019

giuseppe commented Nov 20, 2019 • edited Loading

giuseppe commented Jan 8, 2020

giuseppe commented Jan 20, 2020

giuseppe commented Nov 20, 2019 •

edited

Loading