Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround to avoid a race when /var/lib is a persistent dataset #9360

Merged
merged 1 commit into from
Oct 2, 2019

Conversation

didrocks
Copy link
Contributor

Motivation and Context

There is a race when /var/lib is a separate dataset not under /ROOT/<root_dataset> between systemd-random-seed which creates/refresh a stamp file and zfs mount -a which will try to mount on /var/lib which isn't empty.

Description

If /var/lib is a dataset not under /ROOT/<root_dataset>, as proposed in the ubuntu root on zfs upstream guide (https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS), we end up with a race where some services, like systemd-random-seed are writing under /var/lib, while zfs-mount is called. zfs mount will then potentially fail because of /var/lib isn't empty and so, can't be mounted.
Order those 2 units for now (more may be needed) as we can't declare virtually a provide mount point to match "RequiresMountsFor=/var/lib/systemd/random-seed" from systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled by default nor necessarily required.

Example:

  • rpool/ROOT/ubuntu (mountpoint = /)
  • rpool/var/ (mountpoint = /var)
  • rpool/var/lib (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting After=systemd-remount-fs.service. zfs-mount.service should be done before local-fs.target while systemd-random-seed.service should finish before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths or move systemd-random-seed after local-fs.target.

The fix changes the unit ordering. I'm opened to any better option though if you can think of one.

How Has This Been Tested?

This has been tested on ubuntu eoan (incoming 19.10) with the previous layout as descibed. The change is minimal and now the unit ordering is reproducible.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • [] I have updated the documentation accordingly.
    (No documentation to update AFAIK)
  • I have read the contributing document.
  • I have added tests to cover my changes.
    (No tests on that part)
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.

If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.

Example:
- rpool/ROOT/ubuntu (mountpoint = /)
- rpool/var/ (mountpoint = /var)
- rpool/var/lib  (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.

Signed-off-by: Didier Roche <[email protected]>
@ikozhukhov
Copy link
Contributor

ikozhukhov commented Sep 25, 2019

it is bad idea move /var/lib to separate dataset - it contain many important data like dpkg info(just for example DPKG+APT based distributions) and it should be in sync with distribution version.
for example: on DilOS(use dpkg+apt) we are using BE (Boot Environment) where upgrade system to separate dataset and for safe logic in packages versions. every BE contain correct packages versions on boot dataset(rpool/ROOT/) in /var/lib/dpkg/ and we are able manage updates/upgrades and boot to different BE with safe packages.
also, no need move /var to separate dataset - you should try to understand how to safely update/upgrade your system and probably try to move to separate datasets some data, like:
/var/tmp
/var/spool
/var/log
if you have plans to boot to different system versions - try to identify what content can be permanent, but some data should be versioned

@didrocks
Copy link
Contributor Author

@ikozhukhov: you are right that moving all /var/lib in a separate dataset is a bad idea on its own.
If you look at the zfs on root ubuntu guide (and the incoming default implementation, especially for servers), the /var/lib is a seaprate dataset, but some system-related state datasets are under /ROOT//var/lib/apt for instance (same for dpkg, aptitude, network-manager… and others).
The idea is to keep persistent version independant paths that sysadmin will expect to be persistent, like docker, snap, mysql and others… However, this is an ever growing list compared to the system related paths which should be kept in sync with the dataset, hence default is separate /var/lib and selected paths under /ROOT//… (ofc /ROOT//var/lib is canmount=off thus).

I hope this helps shading some lights :)

@rlaager
Copy link
Member

rlaager commented Sep 25, 2019

Why can't you do RequiresMountsFor=/var/lib/systemd/random-seed? Doesn't that work if you use the zfs mount generator that is part of 0.8?

Edit:

I see that you said, "The optional generator for zfs 0.8 fixes it, but it's not enabled by default nor necessarily required." While the mount generator is not required in general, if you're doing root-on-ZFS, you really should enable it (once on ZFS 0.8), and e.g. the Buster HOWTO has steps for that. Hopefully, you're enabling that in the installer for 19.10. For 18.04, that's why I have the work-around of mounting certain things in /etc/fstab. I guess /var/lib is another one of those.

The change from this PR isn't particularly harmful. We wouldn't want to try to add too many of those orderings, though, or it would get to be a mess.

@didrocks
Copy link
Contributor Author

@rlaager: RequiresMountsFor=/var/lib/systemd/random-seed is the other way around. It's systemd-randome-seed which declares needing /var/lib/systemd to be mounted. What I meant in the description is that there is no way (that I know of at least) to declare that it will mount /var/lib/ (or /var/lib/systemd).

Some of our users won't use the generator as it's optional, and so, will potentially ends up with a broken on boot system.

@rlaager
Copy link
Member

rlaager commented Sep 25, 2019

My comment edit may have crossed with your comment (and I'm not sure if edits send emails). systemd-random-seed.service has RequiresMountsFor=/var/lib/systemd/random-seed. So if /var/lib exists as a mount unit (ZFS or otherwise), systemd should order things correctly, right? Which is why the ZFS generator fixes the problem. So people who are doing root-on-ZFS should use the mount generator in 0.8, and things Just Work, right?

@didrocks
Copy link
Contributor Author

didrocks commented Sep 25, 2019

Right, if they are using the mount generator, this would work as the .mount generated unit will have it.
However, it's still an optional (non enabled by default) feature on 0.8. Also, it does have some limitations (like rollback some datasets will still keep trying to mount wrong-older datasets that were cached on /etc/zfs-list-cache/ (I guess it needs cache-invalidation)), which may not be a perfect fit for everyone yet.
We already got some cases where people weren't using it (relying on zfs-mount.service), which then, triggers this issue.

@ikozhukhov
Copy link
Contributor

@didrocks if you can see this point on recommended page - they are have mistake and not using different BE(Boot Environment) for ability to boot to different system versions. it is issue of product design and should be fixed - but it is up to distribution - how they wants to manage it.
i'm owner of DilOS and know how it works well with BE :)

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 25, 2019
@codecov
Copy link

codecov bot commented Sep 27, 2019

Codecov Report

Merging #9360 into master will increase coverage by 0.24%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9360      +/-   ##
==========================================
+ Coverage   79.06%    79.3%   +0.24%     
==========================================
  Files         401      401              
  Lines      122495   122495              
==========================================
+ Hits        96845    97144     +299     
+ Misses      25650    25351     -299
Flag Coverage Δ
#kernel 79.82% <ø> (+0.04%) ⬆️
#user 67.17% <ø> (+0.6%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d359e99...d5cd18f. Read the comment docs.

@behlendorf
Copy link
Contributor

behlendorf commented Oct 1, 2019

This seems like a pretty straight forward way to work around the issue. Hopefully, we won't need to introduce to many more of these. @aerusso @rlaager if you're OK with this can you please go ahead and approve the PR.

@aerusso
Copy link
Contributor

aerusso commented Oct 2, 2019

In the long run, doing it this way will certainly become a nightmare (and we should try to eventually move to the new generator framework). But we're not there yet, and this looks like the right mitigation.

@didrocks didrocks I'm curious about this comment:

Also, it does have some limitations (like rollback some datasets will still keep trying to mount wrong-older datasets that were cached on /etc/zfs-list-cache/ (I guess it needs cache-invalidation))

Can you please open an issue (or point me to it) with more information on this? I'd like to make sure that zfs-mount-generator actually works well.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 2, 2019
@behlendorf
Copy link
Contributor

But we're not there yet, and this looks like the right mitigation.

It would be great if someone could open an issue detailing what would be required to get there.

@behlendorf behlendorf merged commit 8ae8b2a into openzfs:master Oct 2, 2019
@didrocks
Copy link
Contributor Author

didrocks commented Oct 3, 2019

@behlendorf: thanks for merging!

@aerusso: I'll do a round of testing with the various cases that are currently broken with the current generator and open separate issues for each of them (probably by the end of next week).

tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 26, 2019
If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.

Example:
- rpool/ROOT/ubuntu (mountpoint = /)
- rpool/var/ (mountpoint = /var)
- rpool/var/lib  (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Didier Roche <[email protected]>
Closes openzfs#9360
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.

Example:
- rpool/ROOT/ubuntu (mountpoint = /)
- rpool/var/ (mountpoint = /var)
- rpool/var/lib  (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Didier Roche <[email protected]>
Closes openzfs#9360
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.

Example:
- rpool/ROOT/ubuntu (mountpoint = /)
- rpool/var/ (mountpoint = /var)
- rpool/var/lib  (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Didier Roche <[email protected]>
Closes #9360
@rlaager rlaager mentioned this pull request May 30, 2020
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants