Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount sysfs in namespace #3831

Closed
wants to merge 1 commit into from
Closed

Conversation

gabibeyer
Copy link

mount sysfs so the process running in the container
can see the correct /sys/class/net entries

Signed-off-by: Gabi Beyer [email protected]

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gabibeyer
To complete the pull request process, please assign jwhonce
You can assign the PR to them by writing /assign @jwhonce in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added size/XS needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 15, 2019
@openshift-ci-robot
Copy link
Collaborator

Hi @gabibeyer. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rhatdan
Copy link
Member

rhatdan commented Aug 16, 2019

/ok-to-test
Thanks @gabibeyer
@mheon @giuseppe PTAL

@openshift-ci-robot openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 16, 2019
@mheon
Copy link
Member

mheon commented Aug 16, 2019 via email

@amshinde
Copy link

@mheon This is something that may be required for the OCI runtime itself, not just the container.
We see that this is currently handled with rootlesskit for rootless docker, would be nice if this is behaviour is implemented in podman as well.

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS labels Aug 16, 2019
@openshift-ci-robot openshift-ci-robot added size/XS and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 16, 2019

// mount sysfs so the process running in the container can see the
// correct /sys/class/net entries
err = unix.Mount("sysfs", "/sys", "sysfs", unix.MS_RDONLY, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goroutine is running in a new network namespace but it is sharing the mount namespace with the host. How is the new mount visible from the container?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be very off in my understanding of namespaces, but by default doesn't podman create a user namespace that all containers share (unless explicitly stated). The network namespace is being created inside that user namespace, and since mount namespaces require an owner user namespace all that information would be accessible by the container if it enters the network namespace?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blah disregard that. I think what's happening is that it is being mounted correctly, but it needs to do a mount sooner, right after the network namespace creation in order to the /sys/class/net to be generated correctly for the oci runtime to use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabibeyer Perhaps, the commit message needs to be rephrased a bit, to specify that the OCI runtime needs to access the /sys.
@giuseppe We need this behaviour for the runtime. In case of Kata, we need to inspect the network namespace to get the type of network interfaces created in order to connect them to the virtual machine.
Most of this information can be obtained from netlink API.
But we may need to get this from sysfs in some cases. In case of tap interfaces, we do not get the information whether a tuntap device is actually "tap" or "tun" on older kernels from netlink. In that case we need to fallback to the sysfs, and expect the sysfs to be remounted to reflect the correct /sys/class/net information corresponding to the new network namespace created by podman.

I hope that makes things a bit clearer. A related PR submitted to netlink:
vishvananda/netlink@db99c04

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #2940) made this pull request unmergeable. Please resolve the merge conflicts.

mount sysfs so the process running in the container
can see the correct /sys/class/net entries

Signed-off-by: Gabi Beyer <[email protected]>
@giuseppe
Copy link
Member

@gabibeyer is it still needed?

@gabibeyer
Copy link
Author

@giuseppe I believe so. @amshinde @marcov would you happen to know more?

@giuseppe
Copy link
Member

do we have an example of issue we are trying to solve with this PR? Still it is not clear to me how mounting /sys again can make any difference. Don't we also leak a mount each time the code runs?

@marcov
Copy link
Collaborator

marcov commented Oct 17, 2019

@amshinde explained here why access to sysfs is needed, #3831 (comment)

@gabibeyer: is this something specific for rootless?

@amshinde
Copy link

@giuseppe As @marcov mentioned, I have added an explanation in an earlier comment.
@marcov This isnt specific to rootless as such, its generally a good idea to have the sysfs remounted for a new network namespace, but the tap interfaces created in case of rootless makes this a requirement.

@marcov
Copy link
Collaborator

marcov commented Oct 17, 2019

Got it @amshinde, then maybe you could consider only bind mounting the path necessary to look up the tap interfaces, and not the whole /sys

@giuseppe
Copy link
Member

@giuseppe As @marcov mentioned, I have added an explanation in an earlier comment.

@amshinde thanks, I've read that comment but it is still not clear for me, isn't the mount namespace at this point the same as on the host? Isn't /sys already mounted there?

Is it needed for the rootless case?

@amshinde
Copy link

amshinde commented Nov 4, 2019

Let me see if I can explain this better.

I've read that comment but it is still not clear for me, isn't the mount namespace at this point the same as on the host? Isn't /sys already mounted there?

@giuseppe Yes, the mount namespace is the same on the host, /sys is already mounted there, but /sys is not tied to mount namespaces. The /sys on entering the new network namespace still points to the host network namespace.

A little background on the issue: https://lwn.net/Articles/295587/
Any sysfs directory can be associated with (at most) one type of tag, where that type identifies which type of namespace controls how that directory is viewed. Thus, for example, /sys/class/net would have a tag type identifying the network namespace subsystem as the one which is in control there.
When podman creates the network namespace, once Kata runtime enters the network namespace to inspect it, the sysfs entries in /sys/class/net still refer to the network interfaces in the host network namespace.

An issue that describes this in some more detail:
https://unix.stackexchange.com/questions/457025/switching-into-a-network-namespace-does-not-change-sys-class-net

I see that a similar project rootless-kit addressing this here:
https://github.com/rootless-containers/rootlesskit/blob/2cb5ec5819fddecb93aeea843ecdf29d6ecaf9af/pkg/child/child.go#L131

Let me know if you have any further questions, maybe we can discuss this on the crio call this week if you are on it.

@amshinde
Copy link

@giuseppe @mheon Any updates on this PR?

@giuseppe
Copy link
Member

giuseppe commented Nov 20, 2019

@giuseppe @mheon Any updates on this PR?

I've verified it locally and it creates a new RDONLY /sys mount on the host . I don't think that should happen.

If /sys is not already mounted, we should probably raise an error and ask the users to fix it.

@giuseppe
Copy link
Member

giuseppe commented Jan 8, 2020

@gabibeyer @amshinde is this patch still needed? If you'd like I can help here, can we discuss the issue you are having?

@giuseppe
Copy link
Member

as there is no progress, I am closing the PR. Please re-open if it is still needed

@giuseppe giuseppe closed this Jan 20, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants