-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd.mount integration #7329
Conversation
4ed3050
to
8d27bc6
Compare
concept LGTM at a first glance - not yet tested though ;) hopefully I'll find some time soon. I wonder whether we want some kind of integration for re-generating the cached dataset list? it does sound a bit cumbersome to have to remember to re-run it after creating or renaming a dataset.. OTOH I am not sure how we'd be able to automate that easily.. maybe some kind of (optional) zfs-zed integration? |
8d27bc6
to
0eff96e
Compare
So, I took a look at Also, I changed the semantics around a little bit:
This should further reduce the need to populate It would be nice if someone running zfs on root could give this script a try (just toss it in |
@aerusso Thanks for that, works like a charm! I'm running root on ZFS with homegrown boot environments and a complex fs layout (see below). This is a current Arch Linux with zfs-git kmod. Up to now all mount points were set to legacy and got mounted via I'm still seeing a
I don't know much about Thanks again. |
@AttilaFueloep I'm glad it's working! Thanks for testing it. My setup is very similar, except root is not ZFS (yet). I also cannot get As for the unclean mounting, I'd guess you have some residual files/directories under
Then carefully remove/move anything inside there. The original motivation for this patch was the contamination of these mountpoints when things didn't get set up correctly and some services got started. |
On Thu, Mar 22, 2018 at 10:22:50PM +0000, Antonio Russo wrote:
So, I took a look at `cmd/zed/zed.d/README`, and decided that I wasn't going to try to implement a new zed event (at least right now). However, I did fix up the script a bit (it works with bash and dash, instead of requiring bash). All of `shellcheck.net` valid complaints have been addressed.
see below
Also, I changed the semantics around a little bit:
1. The dependencies of `local-fs.target` are reduced to `Wants`. This reduces the chance of unpleasant surprises if a mount fails. Maybe it gets raised to `Requires` after it gets a little wider testing.
2. Instead of clobbering existing `.mount`s, abort instead.
should we maybe log this? after all, this means there is a potential
conflict between a manually set up .mount unit and a generated one
(previously generated ones are cleared before the generator is called)
3. `zfs list` is run unconditionally, and its output is always preferred to `/etc/zfs/zfs-list.cache`.
the last one is problematic IMHO. `zfs list` can take quite a while on a
system with lots of datasets and under load, or might (in case of bugs /
...) even hang altogether.
see `man systemd.generator`:
```
· Generators are run very early at boot and cannot rely on any external services. They may not talk to any other process. That includes simple things such as logging to syslog(3), or systemd itself (this means: no
systemctl(1))! Non-essential file systems like /var and /home are mounted after generators have run. Generators can however rely on the most basic kernel functionality to be available, including a mounted /sys, /proc,
/dev, /usr.
[...]
· If you are careful, you can implement generators in shell scripts. We do recommend C code however, since generators are executed synchronously and hence delay the entire boot if they are slow.
```
so I think any calls to external binaries, especially ones which could
block in kernel space are a no-go in generators.
I wonder whether it would not be better to drop the call to `zfs list`
altogether and invest some energy into ZED integration to keep the cache
file current? we'd basically need to hook:
- pool creation
- pool import
- pool export (or not?)
- filesystem creation (including receive and clone)
- filesystem destruction (or not?)
- filesystem rename
- filesystem mountpoint property changes
- filesystem canmount property changes
not sure which of those are already available in ZED? if you don't want
to do this I can try to whip something up..
This should further reduce the need to populate `zfs-list.cache` (in fact, just running `systemctl daemon-reload` will re-run generators, producing the desired mount units).
see above for why the fact that systemd will re-run generators on every
reload makes matters worse, not better ;)
e.g. a single upgrade on a Debian testing or unstable might run
`systemctl daemon-reload` tens or hundreds of times (depending on the
number of upgraded packages which contain unit files of some kind)!
I am not sure how the RPM world handles upgraded unit files..
on my system with ~256 datasets, running the generator already takes a
measurable fraction of a second, but I guess most of that is actually
writing the unit files (haven't checked yet)
It would be nice if someone running zfs on root could give this script a try (just toss it in `/etc/system/system-generators` after replacing ***@***.***@` and ***@***.***@` with the correct values).
FWIW, I did on Stretch and Sid. it works fine, and allows me to drop a
`Requires` on zfs-mount.service from a bind mount :) so I'd very much
like to see this integrated in some form!
the testing instructions lack a `chmod +x
/etc/systemd/system-generators/zfs-mount-generator` though ;)
|
600dc6f
to
8778d62
Compare
I've reworked this again using @Fabian-Gruenbichler's suggestions:
Done.
No call to The new patch implements a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder whether we want to deal with blacklisting datasets (besides canmount=off
), since there can only be one unit for each mountpoint in systemd, but more than one dataset with the same mountpoint value in ZFS.
. "${ZED_ZEDLET_DIR}/zed-functions.sh" | ||
|
||
zed_exit_if_ignoring_this_event | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a zed_check_cmd "${ZFS}"
with appropriate exit here
(and maybe for wc, sort, diff, xargs, grep, .. ? all of them are essential on Debian, not sure about other distros/ecosystems?)
# because the history event only reports the command issued, rather than | ||
# the affected ZFS, a rename or recursive destroy can affect a child ZFS. | ||
# Instead of trying to figure out if we're affected, we instead just | ||
# regenerate the cache *for that pool* on every modification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this limitation, wouldn't it make more sense to regenerate also when new datasets are added (e.g., zfs create
or even zpool import
)?
e.g., right now when a dataset gets renamed, it gets dropped from the cache file but not added again under the new name.
I'd rather have the distinction between "cache file is curated manually, no ZED interference" and "cache file is managed by ZED and regenerated on every potential change event". it might make sense to introduce a zed.rc variable to distinguish these modes, depending on which one gets made the default.
# only act if the mountpoint or canmount setting is altered | ||
printf '%s' "${ZEVENT_HISTORY_INTERNAL_STR}" | | ||
grep -q '^\(mountpoint\|canmount\)=' || exit 0 | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems very roundabout - is there no bash/dash compatible way of checking a substring from the start of the string? (I wish we could just write the whole thing in perl to be honest :-P)
|
||
# Get the information for the affected pool | ||
grep "^${ZEVENT_POOL}[/$(printf '\t')]" "${FSLIST}" | | ||
grep -o '[^'"$(printf '\t')"'*' | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grep: Unmatched [ or [^
I think the intended semantics was that of cut -f 1
?
# Get the information for the affected pool | ||
grep "^${ZEVENT_POOL}[/$(printf '\t')]" "${FSLIST}" | | ||
grep -o '[^'"$(printf '\t')"'*' | | ||
xargs @sbindir@/zfs list -H -oname,mountpoint,canmount \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbindir@/zfs
should be ${ZFS}
grep "^${ZEVENT_POOL}[/$(printf '\t')]" "${FSLIST}" | | ||
grep -o '[^'"$(printf '\t')"'*' | | ||
xargs @sbindir@/zfs list -H -oname,mountpoint,canmount \ | ||
>"${FSLIST_TMP}" 2>/dev/null || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depending on whether we want to react to any changes (see bigger comment above), the whole preceding section could be replaced with
${ZFS} list -H -t filesystem -oname,mountpoint,canmount -r ${ZEVENT_POOL} > ${FSLIST_TMP}
ae79617
to
2a71bed
Compare
I'm possibly over-engineering this patch, but I've included the ability to specify regular expressions for datasets that should be automatically added. I.e., if you want everything you import, create, or receive to be tracked with this, you can put a line
If you just want everything in some pool, say
Also, renamed datasets should now be correctly followed, and |
Codecov Report
@@ Coverage Diff @@
## master #7329 +/- ##
==========================================
+ Coverage 76.35% 76.39% +0.03%
==========================================
Files 329 329
Lines 104214 104191 -23
==========================================
+ Hits 79576 79599 +23
+ Misses 24638 24592 -46
Continue to review full report at Codecov.
|
ha, yes, that looks a bit over-engineered to me as well ;) how about the following:
that would end up being a lot less code for almost the same result? I wonder whether moving the expressions to two variables in zed.rc (whitelist and blacklist) and implying a whitelist of minor unrelated nit: most other zedlets use 4 spaces for indentation, it probably makes sense to follow that style. the case statements are currently also pretty weirdly indented in parts.. |
if we want to keep the current "try to update changed parts of existing cache file" approach, I would highly recommend moving to something other than shell (then we could just map the existing cache file to some hash/dict structure, update that as needed, and write it back out). not sure whether that is acceptable given that all the current zedlets are shell scripts? maybe @behlendorf can chime in on that point? |
The ZEDLET infrastructure was originally designed so they could be any manor of executable. However, the hope was that the majority of them would be simple enough that they could be written in shell so they'd be trivial to inspect and customize. These are getting a little on the long side but I don't think they're too unweildy yet! |
f02608c
to
59de246
Compare
Here's another stab.
Also, I've tested it out with a broken entry in the cache. It, as expected, doesn't interfere with the boot more than just showing that a unit failed (because the dependency is at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to test this, I've only read through the PR but the approach looks reasonable. @Fabian-Gruenbichler do you have any additional concerns?
@aerusso how would you like to proceed. Since the zedlet isn't enabled by default merging this is low risk. That woud make it a little easier to get some additional testing on other distributions.
@behlendorf I'm happy with it, and the biggest weakness is the lack of testing, though I have it running on several machines, and the systemd generator seems robust on them. We can always enable the zedlet later, after wider testing. |
On Tue, Apr 03, 2018 at 11:51:12PM +0000, Brian Behlendorf wrote:
behlendorf commented on this pull request.
I haven't had a chance to test this, I've only read through the PR but the approach looks reasonable. @Fabian-Gruenbichler do you have any additional concerns?
@aerusso how would you like to proceed. Since the zedlet isn't enabled by default merging this is low risk. That woud make it a little easier to get some additional testing on other distributions.
I still think some way to disable generation of the mount unit for
certain datasets would be good to have for the zedlet. think of the
following scenario:
tank/baz/bar /bar noauto
tank/foo/bar /bar on
depending on the order, the noauto one will be generated, but the auto
one will not! arguably we could say that you need to set canmount to off
in such a case (then this should be documented as a limitation of the
zedlet?) - but having two optional variables for whitelisting and
blacklisting the datasets (which just get mapped to two grep calls)
would IMHO be a nice followup to handle received or cloned datasets more
easily.
no blocker for me though - if you want I could handle that as a
separate PR?
|
zfs-mount-generator implements the "systemd generator" protocol, producing systemd.mount units from the cached outputs of zfs list, during early boot, integrating with systemd. Each pool has an indpendent cache of the command zfs list -H -oname,mountpoint,canmount -tfilesystem -r $pool which is kept synchronized by the ZEDLET history_event-zfs-list-cacher.sh Datasets not in the cache will be loaded later in the boot process by zfs-mount.service, including pools without a cache. Among other things, this allows for complex mount hierarchies. Signed-off-by: Antonio Russo <[email protected]>
59de246
to
84bc27c
Compare
@aerusso Thanks for your explanations. I really feel bad for not reading the whole thread thoroughly before posting, everything was there. To my excuse, maybe my broken system made me a bit to nervous. I've already added a toggle of the canmount property of a test filesystem on the root pool to my boot environment script. If I understand correctly this should trigger the zed script to update the cache file. On a side note, while this works for updating the active BE, it won't for updating a non-active BE and I've no easy solution here. What would you think of making |
Have you considered moving the cache file to /boot (which is presumably shared across BEs) and symlinking. That way, when you update it, the non-active BE also sees the changes?
If /boot is not shared across BEs in your setup, then I don’t have a great idea either. Though this may not matter, e.g. if the initramfs is doing the mounting anyway.
|
In my setup /boot is on the zfs root filesystem and therefore not shared. I have separate filesystems for eg. /home and /var and initramfs (initcpio) does not mount them. Before this PR I had all mountpoints set to legacy and handled mounting via /etc/fstab, which is quite inconvenient. So this PR is a substantial improvement. If I could call |
Should this work on Ubuntu 16.04? I've copied this zedlet into /etc/zfs/zed.d and chown+chmod to root:root 755), but it doesn't fire when I run zpool import/export. Verbose output from zed:
|
You have to Notice also that #7453 will change these formats relatively soon. |
Ah, I missed the "touch" step. I generated the cache file by running the zfs list command by hand and then the generator started working. Thanks for the heads up, I'll subscribe to that PR. Edit: I copied the latest versions of the generator and zedlet from master today, so I think I'm good. |
[ -d "${FSLIST}" ] || exit 0 | ||
|
||
do_fail() { | ||
printf 'zfs-mount-generator.sh: %s\n' "$*" > /dev/kmsg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aerusso Shouldn't that read zfs-mount-generator
instead of zfs-mount-generator.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, of course, would also apply to all occurrences of zfs-mount-generator.sh
inside this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I've got a queue of documentation typos, I'll add these to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just cosmetic. I now tend to use myname=$(basename "$0")
in my scripts after tripping over it as well.
May I suggest a minor edit to the man page? The generator code doesn't provide any hint in the message when this fails. I suggest to replace |
@rugubara How did a zvol wind up in the zfs-list cache? That's a bug in the ZED-let. |
@aerusso, I couldn't get zedlet working for me yet. I wrote /etc/zfs/zfs-list.cache/pool myself using the zfs list command from man page. |
Just a head's up, github is still having problems from last night, you might not see new posts but I believe they are still going through. |
Thank you for confirming that I'm not the only one who cannot see all the replies! The man page has a little, tricky, caveat:
though the real bug is that the zedlet isn't working for you. What happened there? |
@aerusso, I forgot to symlink history_event-zfs-list-cacher.sh to /etc/zfs/zed.d. Once symlinked, everything works ok. |
Is this something that could go live with 0.7.12 or do we have to wait for 0.8.0? |
In case you still have the unmounting problem, the solution is described in #8060 (comment). That fixed it for me. |
Implements the "systemd generator" protocol in
zfs-mount-generator
Description
zfs-mount-generator implements the "systemd generator" protocol, producing systemd.mount units from the (possibly cached) output of
zfs list
during early boot, giving full systemd integration.The most visible benefit of this is that /etc/fstab can safely refer to zfs mount points, because systemd will take care to mount filesystems in the correct order.
Motivation and Context
This PR takes a different approach from #6974, which modified
/etc/fstab
to reflect ZFS mountpoints. Here, instead, ZFS mounts are tracked by directly creating native systemd.mount
units, at early boot from the output ofzfs list -H -t filesystem -oname,mountpoint,canmount
. Because pools may not be imported, the output of this command can be saved in/etc/zfs/zfs-list.cache
. If the pools are for some reason mounted at early boot (e.g., zfs on root), this file can be omitted and the command will be run.This generator is not required; it does not interfere with
zfs-mount.service
, so anything missing from the cache file (or from an unimported pool) will be mounted as before.As mentioned before, this allows for complex mount hierarchies (e.g., bind mounts that must happen after zfs mounts are made; any other filesystem mounted on top of any ZFS). Notice that ZFS on root users are most likely to want such features, and will not have to create the
zfs-list.cache
file.How Has This Been Tested?
I've been using several incarnations of this generator for several months, allowing for some maturity in the patches. E.g., a dependency has been reduced from
Requires
toWants
to prevent filesystems from being unmounted when zfs systemd units are shuffled around during upgrades.Types of changes
Checklist:
Signed-off-by
.