Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bootfs.snapshot and bootfs.rollback kernel parameters #10198

Merged
merged 4 commits into from
May 30, 2020
Merged

Add bootfs.snapshot and bootfs.rollback kernel parameters #10198

merged 4 commits into from
May 30, 2020

Conversation

gregory-lee-bartholomew
Copy link
Contributor

@gregory-lee-bartholomew gregory-lee-bartholomew commented Apr 13, 2020

Unlike other filesystems, snapshots and rollbacks of bootfs need to be
done from a rescue environment. This patch makes it possible to snap-
shot or rollback the bootfs simply by specifying bootfs.snapshot or
bootfs.rollback on the kernel command line. The operation will be
performed by dracut just before bootfs is mounted.

Signed-off-by: Gregory Bartholomew [email protected]

Motivation and Context

This is a feature enhancement to make creating snapshots of the bootfs and rolling them back easier.

Description

This patch adds two simple systemd services to be included in the user's initramfs if they are using systemd and dracut. The services are conditioned on the bootfs.snapshot and bootfs.rollback kernel parameters being present. So they will not do anything if neither options are set. If the bootfs.snapshot kernel parameter is present, a snapshot of the bootfs will be created just before it is mounted. If the bootfs.rollback parameter is present, an attempt will be made to rollback the bootfs just before it is mounted. The default name for the snapshot is the current kernel version string. A name can be specified by assigning the desired value to the kernel parameter (e.g. bootfs.snapshot=temp). One advantage of leaving the default value is that a new snapshot of your bootfs will be automatically created every time you upgrade your kernel if you leave the bootfs.snapshot parameter set in /etc/kernel/cmdline (or where ever your boot loader sets the kernel command line).

How Has This Been Tested?

I have only done minimal testing of this patch on my personal computer (Fedora 31). The patch is pretty simple though, so I don't anticipate many problems.

I am submitting this as a draft pull request because it is my first (ever) attempt at submitting a pull request. I've probably missed many things/steps. Please let me know what I need to do to move this along if you are interested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

Unlike other filesystems, snapshots and rollbacks of bootfs need to be
done from a rescue environment. This patch makes it possible to snap-
shot or rollback the bootfs simply by specifying bootfs.snapshot or
bootfs.rollback on the kernel command line. The operation will be
performed by dracut just before bootfs is mounted.

Signed-off-by: Gregory Bartholomew <[email protected]>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 13, 2020
@behlendorf
Copy link
Contributor

Thanks for opening this PR. Providing a way to rollback your bootfs was definitely a missing bit of functionality. @rlaager @aerusso would you mind having a look at this.

@gregory-lee-bartholomew
Copy link
Contributor Author

Thanks for opening this PR. Providing a way to rollback your bootfs was definitely a missing bit of functionality. @rlaager @aerusso would you mind having a look at this.

You're welcome. This patch is a very rough draft at this point. Help and feedback on how to improve it are very welcome. One thing in particular that is missing is the -r option to rollback in the zfs-rollback-bootfs. So as it currently is, it will only rollback the most recent snapshot.

@rlaager
Copy link
Member

rlaager commented Apr 24, 2020

I gave this a really quick glance. I don't have much feedback on the specifics here. I think there's a bigger design question here: Do we want to do things like this or outsource this to zsys, which Ubuntu has put a lot of work into? I haven't yet used zsys, so I can't advocate one way or the other at this time. But now that 20.04 is out, I definitely need to look at this.

@gregory-lee-bartholomew
Copy link
Contributor Author

I gave this a really quick glance. I don't have much feedback on the specifics here. I think there's a bigger design question here: Do we want to do things like this or outsource this to zsys, which Ubuntu has put a lot of work into? I haven't yet used zsys, so I can't advocate one way or the other at this time. But now that 20.04 is out, I definitely need to look at this.

I had not heard of zsys until you mentioned it. I just took a quick look at its readme on github. I see commands like:

zsysctl boot update-menu

Does that mean that its interface runs from within the currently-booted system? If so, that might not be of much help if a person has corrupted their bootfs somehow and needs to roll it back.

Also, the "update-menu" command suggests to me that this is integrated with some specific boot loaders. It may not work for other distributions if they are using different/unsupported boot loaders.

zsysctl does look pretty fancy, but for purposes of rescuing a potentially failed system maybe a more kiss design would be better?

@rlaager
Copy link
Member

rlaager commented Apr 24, 2020

Does that mean that its interface runs from within the currently-booted system? If so, that might not be of much help if a person has corrupted their bootfs somehow and needs to roll it back.

Also, the "update-menu" command suggests to me that this is integrated with some specific boot loaders.

The answer to both of these is: it integrates with GRUB. For people who don't want to use GRUB, it may not be a good fit, which might leave room for other solutions.

@gregory-lee-bartholomew
Copy link
Contributor Author

The answer to both of these is: [zfs] integrates with GRUB. For people who don't want to use GRUB, it may not be a good fit, which might leave room for other solutions.

Yeah, I specifically don't use GRUB because it rolls its own ZFS driver which (last time I messed with it) didn't support the latest and greatest ZFS features.

Personally, I don't think I would be comfortable at all using a tool to manage my ZFS pools that might be running an out-of-sync ZFS driver. By using the zpool/zfs commands and zfs.ko that are pushed into the initramfs on every kernel update, you can be sure that you are using the ZFS driver that is up-to-date with all your enabled zpool/zfs features. From what you've said, I don't think I would want to use zsys even if it was available.

@aerusso
Copy link
Contributor

aerusso commented Apr 24, 2020

  1. I definitely think we should provide the rollback function, but why the snapshot? Booting a machine usually doesn't do a whole lot to the root filesystem, and zfs snapshot can definitely be called during regular operation.

  2. I don't think this should mark_hostonly. But, I also don't think that zfs-env-bootfs.service and zfs-import.target should mark_hostonly, either. I'm to blame for zfs-import.target, as I punted this question down the road. We should probably resolve it here and now.

  3. Should we have the ability to pass -F to rollback? Or---even better---create a clone and mount that one instead?

@gregory-lee-bartholomew
Copy link
Contributor Author

  1. I definitely think we should provide the rollback function, but why the snapshot? Booting a machine usually doesn't do a whole lot to the root filesystem, and zfs snapshot can definitely be called during regular operation.

In my thinking, the reason for bootfs.snapshot is similar to the reason for bootfs.rollback -- to provide an opportunity to change the state of all the file system data at a point in time when one can be assured that all processes have synchronized/flushed their copies of that data to the file system. Since this runs before bootfs is mounted, no process can have a private copy of any of that data at this point in time.

  1. I don't think this should mark_hostonly. But, I also don't think that zfs-env-bootfs.service and zfs-import.target should mark_hostonly, either. I'm to blame for zfs-import.target, as I punted this question down the road. We should probably resolve it here and now.

I have no opinion either way on that. I guess I can see where bootfs might be considered a host-specific feature though.

  1. Should we have the ability to pass -F to rollback? Or---even better---create a clone and mount that one instead?

Changing what is mounted sounds like an interesting possibility. I guess we'd want to swap the names of the original file system and the clone so that there would be no confusion with other tools/scripts that might be expecting the original name?

@gregory-lee-bartholomew
Copy link
Contributor Author

gregory-lee-bartholomew commented Apr 24, 2020

Changing what is mounted sounds like an interesting possibility. I guess we'd want to swap the names of the original file system and the clone so that there would be no confusion with other tools/scripts that might be expecting the original name?

I just thought of a potential use case where creating additional snapshots with every rollback might be undesirable. What if someone left bootfs.rollback set on the command line persistently with the intent that the OS should revert to the original state on every reboot? I can see where, e.g., a school lab might want to do something like that. I don't think they would want the file system to create an additional clone on every boot in that case though.

@terem42
Copy link

terem42 commented Apr 27, 2020

FYI: For Debian-based systems, using initramfs, rescue feature already present: https://github.com/openzfs/zfs/blob/master/contrib/initramfs/README.initramfs.markdown
it's enough to reference desired snapshot and set rollback=1 in kernel command line argument, inside GRUB shell. No other tools are required.

@gregory-lee-bartholomew
Copy link
Contributor Author

FYI: For Debian-based systems, using initramfs, rescue feature already present: https://github.com/openzfs/zfs/blob/master/contrib/initramfs/README.initramfs.markdown
it's enough to reference desired snapshot and set rollback=1 in kernel command line argument, inside GRUB shell. No other tools are required.

Thanks Andrey. I think 90% of the stuff in that script is already implemented in dracut (parsing the commandline, loading the module, importing the pool, mounting the file system, etc.). About the only things missing from dracut so far is are the rollback, snapshot, and clone functions.

I do think dracut should attempt to be as consistent as possible with what already exists. I see that the rollback function essentially amounts to the following command:

ZFS_CMD="${ZFS} rollback -Rf $snap"

Unless someone has a compelling argument why it should be done differently, I will change zfs-rollback-bootfs.service so that it runs the rollback command with the same options.

I see that cloning the snapshot is an entirely different function. That seems like a good design decision to me -- that is really a different operation than "rollback" and it should exist as a distinct command/parameter/systemd service.

For this patch, I would like to focus just on basic snapshot/rollback functionality for the bootfs.

@aerusso
Copy link
Contributor

aerusso commented Apr 28, 2020

For this patch, I would like to focus just on basic snapshot/rollback functionality for the bootfs.

I've thought about this, and I agree. More advanced work can be done within the dracut pre-mount environment. We don't need to go re-implementing the zfs command line.

I do, however, think we should settle the whole mark_hostonly thing now.

@gregory-lee-bartholomew
Copy link
Contributor Author

For this patch, I would like to focus just on basic snapshot/rollback functionality for the bootfs.

I've thought about this, and I agree. More advanced work can be done within the dracut pre-mount environment. We don't need to go re-implementing the zfs command line.

I do, however, think we should settle the whole mark_hostonly thing now.

Great. Sounds like we are on the same page then. I will remove the mark_hostonly flag. Is there any potential for problems with zfs-env-bootfs.service being hostonly while zfs-{rollback,snapshot}-bootfs.service is not?

@gregory-lee-bartholomew
Copy link
Contributor Author

Should I include changes to contrib/dracut/README.dracut.markdown in this pull request to document these new kernel options? How is documentation handled?

@behlendorf
Copy link
Contributor

@gregory-lee-bartholomew yes, please go ahead and update the contrib/dracut/README.dracut.markdown then it sounds like this should be ready to merge.

@gregory-lee-bartholomew gregory-lee-bartholomew marked this pull request as ready for review May 29, 2020 19:24
@gregory-lee-bartholomew
Copy link
Contributor Author

gregory-lee-bartholomew commented May 29, 2020

@gregory-lee-bartholomew yes, please go ahead and update the contrib/dracut/README.dracut.markdown then it sounds like this should be ready to merge.

Done. Unless someone sees something I've missed, I think this is good to go.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 29, 2020
@codecov-commenter
Copy link

codecov-commenter commented May 30, 2020

Codecov Report

Merging #10198 into master will decrease coverage by 0.31%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10198      +/-   ##
==========================================
- Coverage   79.48%   79.17%   -0.32%     
==========================================
  Files         387      387              
  Lines      123388   123388              
==========================================
- Hits        98079    97696     -383     
- Misses      25309    25692     +383     
Flag Coverage Δ
#kernel 79.95% <ø> (-0.02%) ⬇️
#user 64.52% <ø> (-1.59%) ⬇️
Impacted Files Coverage Δ
module/zfs/vdev_indirect.c 73.33% <0.00%> (-10.67%) ⬇️
cmd/zvol_id/zvol_id_main.c 76.31% <0.00%> (-5.27%) ⬇️
cmd/ztest/ztest.c 75.78% <0.00%> (-5.00%) ⬇️
module/zfs/space_map.c 93.31% <0.00%> (-4.96%) ⬇️
module/zfs/spa_checkpoint.c 93.78% <0.00%> (-4.35%) ⬇️
module/lua/lmem.c 83.33% <0.00%> (-4.17%) ⬇️
module/zfs/bpobj.c 86.86% <0.00%> (-3.76%) ⬇️
module/icp/api/kcf_mac.c 36.57% <0.00%> (-3.43%) ⬇️
module/zfs/vdev_removal.c 93.90% <0.00%> (-2.77%) ⬇️
module/zfs/zio_compress.c 89.74% <0.00%> (-2.57%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20f2878...1be3068. Read the comment docs.

@behlendorf behlendorf merged commit 9052e3d into openzfs:master May 30, 2020
@gregory-lee-bartholomew
Copy link
Contributor Author

What does it mean: "Failed shell_5 ..." on build #11580?

I'm listed under "Responsible Users". Am I expected to do something further with this PR?

@behlendorf
Copy link
Contributor

@gregory-lee-bartholomew thanks for asking. That was an unrelated test failure, there's nothing else you need to do here.

jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Unlike other filesystems, snapshots and rollbacks of bootfs need to be
done from a rescue environment. This patch makes it possible to snap-
shot or rollback the bootfs simply by specifying bootfs.snapshot or
bootfs.rollback on the kernel command line. The operation will be
performed by dracut just before bootfs is mounted.

Reviewed-by: Antonio Russo <[email protected]> 
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Gregory Bartholomew <[email protected]>
Closes openzfs#10198
@gregory-lee-bartholomew gregory-lee-bartholomew deleted the bootfs_snapshot branch June 23, 2022 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants