-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: transient /etc #2972
WIP: transient /etc #2972
Conversation
Hi @raballew. Thanks for your PR. I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
err (EXIT_FAILURE, "failed to make writable /etc bind-mount at /sysroot.tmp/etc"); | ||
if (etc_transient) | ||
{ | ||
/* Do we have a persistent overlayfs for /usr? If so, mount it now. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems out of place, we're not handling /usr atm. nor is it persitent.
I don't quite understand the handling of In the typical, current, case, an ostree commit will contain Given the code in this MR, if transient etc is enabled, it always prefers the Isn't the point of this all that with transient etc, we only ever make |
Actually, why are we using |
Booted this and got:
|
Fixing that I got:
|
I think the answer to this partially is given by #2970 (comment) which implies that there can be an empty /etc which mean find_etc will return /usr/etc (or NULL on error). |
I think that quote is just confused. We always create an /etc in the composefs image (as per comment #2958 (comment) ), so there is no need for that commit at all from what i can see. |
If thats the case, let me drop the find_etc function and use TMP_SYSROOT/etc instead. |
db9fd18
to
905bf97
Compare
With this version (squashed back to one commit): I get it to boot and start most services, including with selinux enabled.
Not sure what |
b8293a1
to
2222aa9
Compare
@alexlarsson I am trying to reproduce your results and I am only able to do so with additional kargs set to enforce=0 otherwise the following units fail: dbus-broker.service |
@raballew Sorry, I can't get it to work, must have forgotten i was in permissive mode. I've done some research here, and here is what I think happens:
A manual run of I've tried to make the upper dir a separate mount to stop the var_run_t relabeling across devices, but that just results in getting tmpfs_t instead. I also tried to mount the overlayfs with I'm not sure there is anything else we can do other than expressing the expected labels for |
I see we have a service (ostree-remount) that runs early after switching to the sysroot before /etc is used. I think we can probably fix up /etc in that. |
Unfortunately several AVCs we hit in /etc happen before this, so it won't work. I guess we really need an ostree selinux module. |
Or, we could perhaps just mount the transient /etc after selinux policy is loaded... |
So, I put this in
After boot (permissive), the upper dir has the right label:
Unfortunately
I guess the initial data is stored in the dcache? |
@alexlarsson So, dropping dcache with |
@raballew I dunno if that is even enough. Some inodes are kept alive even over that if they are references elsewhere. In fact, looking at this in more detail, it is even more complex. Directly after a permissive boot we can see that even if we relabeled the upper dir after policy load, some files in the upper are still owned by var_run_t:
I think this means they were created by overlayfs due to writes to /sysroot/etc before the policy was loaded, and thus the default /run label was applied when labeling /run. Thats not good, for example shadow should be shadow_t. This is probably some ignition thing fixing up the /etc labels, because the /usr/etc/shadow file is not shadow_t. Additionally I see ld.so.cache and machine-id are even tmpfs_t instead of var_run_t. Where these not relabeled as var_run_t? To make things even worse, various files are cached in the overlayfs mount before the relabel happen, which mean that even if the upper dir files get relabled, they are still tmpfs_t when viewed via the overlayfs /etc:
I guess what happened is that since these files were created before selinux policy was added they were in dcache, and the overlayfs inodes that correspond to the upper files were created before the /run relabel, so they aren't aware of the label change. (Really, we shouldn't change the files in the upper dir behind the back of overlayfs like this). And indeed, after
I guess this is what happened to e.g. ld.so.cache. It was created after the relabel, but the root overlayfs had the pre-relabel tmpfs_t label in cache, so the newly created ld.so.cache file inherited that label. We could avoid having the files created in etc be var_run_t by having the selinux policy specify etc_t also for files inside the upper dir. Then the files created before policy load would get etc_t. But we can't really fix the fact that the policy load "loses" the labels added on tmpfs file pre-policy-load (such as shadow_t), and we can't (easily) fix the fact that the relabel of upper doesn't affect the /etc overlayfs files. |
Indeed, if I boot the system with this file_contexts.local:
I get all etc_t for the files in upper that were var_run_t before:
But they are still tmpfs_t in overlayfs (while still cached):
|
Ok, I finally figured out why this is happening: ext4 (etc) uses xattrs (SECURITY_FS_USE_XATTR) to manage the selinux context. On initialization of an inode it reads the security.selinux xattr, parses it according to the policy, and then stores this (broken down) in the eg:
tmpfs uses a different mechanism, eg:
|
Hmm... So, there is another issue too. When ostree deploy creates the merged etc directory after it copies /usr/etc to /etc it does a relabel of etc, because the selinux labels are different in /usr/etc and /etc. This means that a plain overlayfs of /usr/etc will not work as /etc without doing a full relabel. |
Ugh, even more issues. When anonymous temporary files are created with overlayfs they get created in the work dir and that has a different selinux context, so i'm getting avcs like:
Where the unlabeled_t is the work dir (i think). |
Also, even with this all worked around, I keep getting AVCs like this far after switchroot:
At this point things should not still be running as I think the problem here is that the overlayfs mount happened before policy was loaded, and it saves the context from that point and uses it when writing things to the upper dir. |
Ok, so to summarize the problems I've found with using a transient etc
Almost all these problems are related to things happening before So, the summary is that I don't think ostree can implement transient Issues I forsee with this approach:
|
Ok, I tried |
@alexlarsson I have followed your instruction in #2972 (comment) and am able to reproduce the error. Then I used the following unit
And #!/bin/bash
set -euo pipefail
lower_var=/sysroot/var
upper_var=/run/systemd/overlay-sysroot/var
fatal() {
echo "$@" >&2
exit 1
}
if [ $# -ne 1 ] || { [[ $1 != mount ]] && [[ $1 != umount ]]; }; then
fatal "Usage: $0 <mount|umount>"
fi
do_mount() {
echo "dummy mounting"
}
do_umount() {
echo "dummy unmounting"
}
"do_$1" In variant: fcos
version: 1.5.0
systemd:
units:
- name: mount-inplace-var.service
enabled: true
contents_local: mount-inplace-var.service
storage:
files:
- path: /usr/sbin/mount-inplace-var
contents:
local: mount-inplace-var.sh
mode: 0755 After converting the butan to ignition I run
Which is valid because the file does not exist. ls -lisa /sysroot//ostree/boot.1/fedora-coreos/c0aa9551920e67be6bc1d55f459e8dce9f26379dc2dbfdbd3a6beee30b6b51f459e8dce9f26379dc2dbfdbd3a6beee30b6b51ed/0
ls: cannot access '/sysroot//ostree/boot.1/fedora-coreos/c0aa9551920e67be6bc1d55f459e8dce9f26379dc2dbfdbd3a6beee30b6b51f459e8dce9f26379dc2dbfdbd3a6beee30b6b51ed/0' What I dont understand though is why the file
{
"ignition": {
"version": "3.4.0"
},
"passwd": {
"users": [
{
"name": "core",
"passwordHash": "$y$j9T$BQOELCJiutwQDykkREezY0$N31zasZ15aVmYTah/YjBhpLrrUOZY7LQA99HS61yAVC"
}
]
},
"storage": {
"files": [
{
"path": "/usr/sbin/mount-inplace-var",
"contents": {
"compression": "gzip",
"source": "data:;base64,H4sIAAAAAAAC/4RUXWvbMBR91684dUybjDpuurd1LmODwWDspfQplKDYN7aILBlJTue1+e9DtuPUbdaZEMT1ueec+yFPzuK1UPGa24JZcoio1qhERRsuJGNSP5JZ7bhJYttYo7WLd9ywuqoOYVMr/8pRmcV6R0byJnoJZRvuuJzO8MQAgNJCIwi/BLg9v+4iv4XDgu0ZExssEU4QKcICD3h+xhOWS4QLnCUoda0cHh5wfn4M1ofoDfY3cAWplrPVRHBveU6fEF7hc4t77uC3AdsIxnJyK22dIVpxkw8OJ2jIXsIVwiLTZNWFA0/TVmejDWzFU7J4FK4QCtzk9hI0z+coG27yJCgb7LgMLrGuXU8oHKwTUuJRm61tWXRtUNWm0pZsi5q2/z++3yXhBS76RFuQlGlB6RaZsHwtKbn7dn21+NhVqY3Xh1AIpyl3iCuj0zgtMykUzW6Q6RbnH9/bJUIPTxJ0ZScf2sYNXTs8/YyeuMknPXIfDIiNaI+ZVtQeZn50mV61rR262OeF03GXZ+xgBtEfr9G92gd4baQf4S99oMKWjCLpK65LUs6XPSq4s+gn2/qjSurGA1cVd8VxfwfNo5UzRD+9mVc5/3b1FgphobQDh21KKdR27MY67sjL+0vTGQqnhrj0x1N88XzufztugtnIaObRb+ne83oCfbSbCUOp06YZGx7JDV+B91SOoP+Sdwv2Vaisu9VC5QjfuoTTCAfajqK771G09snBiaTgRUq/mfV4NTv1e3WUfqXRf1TGPEGmV+EiYH8DAAD//8hb8b8xBQAA"
},
"mode": 493
}
]
},
"systemd": {
"units": [
{
"contents": "[Unit]\nDescription=Ensure inplace mount of /var\nDefaultDependencies=no\nConflicts=shutdown.target\nBefore=shutdown.target\nAssertPathExists=/etc/initrd-release\nConditionKernelCommandLine=|systemd.volatile=overlay\n\nOnFailure=emergency.target\nOnFailureJobMode=isolate\n\nRequires=systemd-volatile-root.service\nAfter=systemd-volatile-root.service\n\n[Service]\nType=oneshot\nRemainAfterExit=yes\nExecStart=/usr/sbin/mount-inplace-var mount\n\n[Install]\nWantedBy=initrd-root-fs.target\n",
"enabled": true,
"name": "mount-inplace-var.service"
}
]
}
} The unit still fails when running I also tried adding it as a dedicated unit without relying on Ignition to do the heavy lifting with a custom This is what I dont fully understand:
|
Right, but they don't have to be. At build time (e.g. rpm-ostree) we could force /usr/etc to have labels as if it's I am honestly not sure I can think of real downsides to that...suddenly we don't need to relabel I think doing this is going to be way better than trying to do relabeling at runtime. |
2222aa9
to
2cc6b53
Compare
@raballew why did you close this? |
@cgwalters I am not sure. I did a force push to the fork and that seems to have auto-closed the PR. It was not my intention but I can not reopen the PR either. This is weird. The reopen pull request button is disabled so I can only comment. |
That's very strange...I don't have permission to reopen it either, and I'm a repository administrator. I didn't think that was possible. |
Reopening |
Wait, the "reopen and comment" button did turn green for that last comment, but did nothing? I'm pretty sure this sort of Github glitch. Anyways, not a fatal problem - can you just open a new one? |
Ahh I see, if you hover over it, the Github UI is showing "There are no new commits on the raballew:prepare-root-transient-etc branch" - but I didn't think that was required to reopen a PR? Wonder if it'd work if you force pushed again though? |
@cgwalters It seems pushing to the branch resolved the issue. Anyhow, as mentioned in #2972 (comment) a transient etc is not possible at the moment.
For the systemd volatile path I have opened #2986 which has its own challenges. |
Closing in favor of #3062 now... |
Working copy of #2970