Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the rootless-cni-infra container imageless #8910

Closed
wants to merge 1 commit into from

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jan 7, 2021

As proposed by Akihiro Suda make the rootless-cni-infra container use
the host rootfs instead of an image. This works by mounting the host
rootfs in the user namespace to $runroot/rootless-cni-infra
and use this as rootfs for the container.

Second, rewrite the rootless-cni-infra shell script in go to remove the
extra cnitool dependency which is not packaged anywhere. With that we
only need the same dependencies as rootful podman which should be
already installed.

Advantages:

  • Works for all architectures podman supports.
  • Works without internet connection.
  • No extra maintainence of an extra image.

Disadvantages:

  • Requires the dependencies to be available on the host (e.g. dnsname
    plugin). The user may not have control over those.

Problems:

  • It doesn't unmount the rootfs if the the rootless-cni-infra container
    is stopped directly.

Also the image version did not respect the --cni-config-dir option
properly. It mounted the cni config dir only at container create time
but this option can be used on podman run commands which did not
worked if the rootless-cni-infra container was already running.
This is only possible with the rootfs version.

Live upgrading is possible. If the old infra container is still
running podman talks via the old api to the script. Once the
old infra container is deleted the new imageless infra container
will be created and podman can talk via the new api. A version
label is added to the container to distinguish between old and new.

Fixes #8709

Signed-off-by: Paul Holzinger [email protected]

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 7, 2021
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 7, 2021
@Luap99 Luap99 force-pushed the rootless-cni-infra branch 3 times, most recently from 8bde21a to e26ffb8 Compare January 7, 2021 22:00
@Luap99
Copy link
Member Author

Luap99 commented Jan 8, 2021

@rhatdan I need your selinux knowledge here. On fedora the rootless-cni-infra script is failing with the following error:

failed to list chains: running [/usr/sbin/iptables -t nat -S --wait]: exit status 4: Fatal: can't open lock file /run/xtables.lock: Permission denied

This access is blocked by selinux. I see this AVC:

type=AVC msg=audit(1610099754.089:998): avc:  denied  { write } for  pid=6975 comm="iptables" name="/" dev="tmpfs" ino=1 scontext=unconfined_u:system_r:iptables_t:s0 tcontext=system_u:object_r:container_file_t:s0:c664,c817 tclass=dir permissive=0

The rooless-cni-infra container is created with selinux disabled so I don't understand why selinux is blocking access. /run is mounted as tmpfs into the container.

On the host the file has the following label:

$ sudo ls -lZ /run | grep xtables
-rw-------.  1 root    root    system_u:object_r:iptables_var_run_t:s0          0  8. Jan 10:51 xtables.lock

Interestingly enough it works on my test VM which has the fedora server spin installed and not workstation.
They are shipped with different iptables versions. Fedora workstation has iptables v1.8.5 (legacy) and fedora server has iptables v1.8.5 (nf_tables) both are fedora 33.

@rhatdan
Copy link
Member

rhatdan commented Jan 9, 2021

The issue is the iptables command is attempting to create content inside of a directory labeled container_file_t. Iptables command is confined, and not allowed to write container content.

@rhatdan
Copy link
Member

rhatdan commented Jan 9, 2021

Is the /run within the container? IE This is not the /run on the host?

Comment on lines +274 to +359
run := spec.Mount{
Destination: "/run",
Type: "tmpfs",
Source: "none",
Options: []string{"rw", "nosuid", "nodev"},
}
g.AddMount(run)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/run is mounted as tmpfs into the container

WithCtrNamespace(rootlessCNIInfraContainerNamespace),
WithName(rootlessCNIInfraContainerName),
WithPrivileged(true),
WithSecLabels([]string{"disable"}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selinux should be disabled for this container so I am not sure why selinux is blocking the access to this file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the iptables command is confined. The question is who launched it? And why did it transition.

@rhatdan
Copy link
Member

rhatdan commented Jan 10, 2021

Hopefully I will have time early next week to attempt this and figure out what is going on.

return err
}
// bind mount the rootfs in the userns read only
if err := mount.Mount("/", rootfs, "bind", "rbind,rprivate,ro"); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted in comment lines that "rbind,ro" is just a "best effort" to make the tree read-only, as "ro" is not recursive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Is it possible to mount everything as readonly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably possible with fuse-overlayfs. We could also parse /proc/self/mountinfo and bind-mount each of the entries with ro, but that seems too complicated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel patch for recursive read-only is here (unmerged): https://lore.kernel.org/linux-fsdevel/[email protected]/T/#u

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to get that in.

@Luap99 Luap99 force-pushed the rootless-cni-infra branch 2 times, most recently from 16ed86f to 468f86c Compare January 13, 2021 12:37
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2021
@Luap99 Luap99 force-pushed the rootless-cni-infra branch from 468f86c to bc1f8ce Compare February 4, 2021 14:25
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2021
@Luap99 Luap99 force-pushed the rootless-cni-infra branch from bc1f8ce to 5fbe9b6 Compare February 4, 2021 14:37
@Luap99
Copy link
Member Author

Luap99 commented Feb 4, 2021

I reworked this to support live migration from a previous version. If the old infra container is still running podman talks via the old api to the script. Once the old infra container is deleted the new imageless infra container will be created and podman can talk via the new api. A version label is added to the container to distinguish between old and new.

@Luap99
Copy link
Member Author

Luap99 commented Feb 4, 2021

@rhatdan Have you looked at the selinux issue?

For now sudo alternatives --set iptables /usr/sbin/iptables-nft would work for fedora but this is a breaking change for users.

@Luap99
Copy link
Member Author

Luap99 commented Feb 14, 2021

@rhatdan
I tried to look at the selinux issue and found the problem. Look at the following:

# selinux with container image
$ podman run alpine cat /proc/self/attr/current 
system_u:system_r:container_t:s0:c280,c796
$ podman run --security-opt label=disable alpine cat /proc/self/attr/current 
unconfined_u:system_r:spc_t:s0

# selinux with rootfs
$ mkdir -p ~/rootfs
$ podman unshare mount --rbind -r / ~/rootfs/
$ podman unshare mount -t tmpfs none ~/rootfs/run
$ podman run --rootfs ~/rootfs cat /proc/self/attr/current 
system_u:system_r:container_t:s0:c22,c289
$ podman run --security-opt label=disable --rootfs ~/rootfs cat /proc/self/attr/current 
unconfined_u:system_r:container_runtime_t:s0
# why container_runtime_t and not spc_t?

# set selinux labels manually
# spc_t fails
$ podman run --security-opt "label=user:unconfined_u" --security-opt "label=role:system_r" --security-opt "label=type:spc_t" --security-opt "label=level:s0" --rootfs ~/rootfs  cat /proc/self/attr/current 
{"msg":"exec container process `/usr/bin/cat`: Permission denied","level":"error","time":"2021-02-14T16:08:22.000844579Z"}
# unconfined_t works
$ podman run --security-opt "label=user:unconfined_u" --security-opt "label=role:system_r" --security-opt "label=type:unconfined_t" --security-opt "label=level:s0" --rootfs ~/rootfs  cat /proc/self/attr/current 
unconfined_u:system_r:unconfined_t:s0

I don't know if this is a bug or expected. Either way when I set unconfined_u:system_r:unconfined_t for the cni container it works.

@Luap99 Luap99 force-pushed the rootless-cni-infra branch 2 times, most recently from 9f105e2 to 19d65ff Compare February 14, 2021 16:25
@rhatdan
Copy link
Member

rhatdan commented Feb 15, 2021

Do you have AVC Messages?

@rhatdan
Copy link
Member

rhatdan commented Feb 15, 2021

ls -lZd ~/rootfs/

There are several issues here. I am thinking that we could grab the "level" label off of the rootfs and set that to run the container with so that we run as container_t, that might be the best solution for this. Check to make sure that rootfs is labeled container_file_t:MCS and then use the MCS.

@rhatdan
Copy link
Member

rhatdan commented Feb 15, 2021

The reason you are seeing the difference on spc_t versus container_runtime_t is based on the label of ~/rootfs.

The reason spc_t is failing, is that there is an entrypoint fule on spc_t, So only certain labels are allowed to "transition" to spc_t.

The label of the cat command is probably not allowed as an entrypoint to spc_t.

Similar we have transition rules that say when podman running as container_runtime_t runs certain executables it will transition to spc_t, The label of cat within the rootfs does not have a transition rule, so the transition does not happen, and the process continues to run as container_runtime_t.

As proposed by Akihiro Suda make the rootless-cni-infra container use
the host rootfs instead of an image. This works by mounting the host
rootfs in the user namespace to `$runroot/rootless-cni-infra`
and use this as rootfs for the container.

Second, rewrite the rootless-cni-infra shell script in go to remove the
extra cnitool dependency which is not packaged anywhere. With that we
only need the same dependencies as rootful podman which should be
already installed.

Advantages:
- Works for all architectures podman supports.
- Works without internet connection.
- No extra maintainence of an extra image.

Disadvantages:
- Requires the dependencies to be available on the host (e.g. dnsname
plugin). The user may not have control over those.

Problems:
- It doesn't unmount the rootfs if the the rootless-cni-infra container
is stopped directly.

Also the image version did not respect the `--cni-config-dir` option
properly. It mounted the cni config dir only at container create time
but this option can be used on podman run commands which did not
worked if the rootless-cni-infra container was already running.
This is only possible with the rootfs version.

Live upgrading is possible. If the old infra container is still
running podman talks via the old api to the script. Once the
old infra container is deleted the new imageless infra container
will be created and podman can talk via the new api. A version
label is added to the container to distinguish between old and new.

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 force-pushed the rootless-cni-infra branch from ecebe3f to 73393fc Compare February 16, 2021 22:16
@Luap99
Copy link
Member Author

Luap99 commented Feb 18, 2021

OK I don't think I can get this to work. There is a locking issue somewhere. While thinking about it I thought about not using a container at all and just create a netns in the podman user namespace. I have opened #9423 to implement this instead. I think this is better in the long run.

@Luap99 Luap99 closed this Feb 18, 2021
@baude
Copy link
Member

baude commented Feb 18, 2021

agree

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: imageless rootless-cni-infra
5 participants