-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support freeze/thaw #260
Comments
This is a MUST if I ever want to be able to use ZFS on my laptop. If not rootfs, I would still want to use ZFS for my other FSs on the laptop. It is possible that when I was rootfs on my laptop, I may have faced issues because of this. |
Brian, This is a much needed feature. Any idea how much work is it? |
I haven't carefully scoped it but my guy tells me it's probably not that much work. Why is this feature so critical? It's really only important for laptops correct? (Which I agree is important if your using a laptop) What needs to be done here is to tie the Linux freeze/unfreeze hooks to the zfs_suspend_fs()/zfs_resume_fs() functions in zfs_vfsops.c. That should be just a couple lines of code, but then we need to review that change and make sure it's working as expected. Plus there will be the needed compatibility code for older kernels. I'm not going to be able to get to this anytime soon but if you want to dig in to it I'm happy to review changes and comment. But I don't have time for the actual leg work on this right now. |
It is important because 1. I absolutely need it on the laptop, 2. I need it on my desktop, which has been suspending to RAM/disk every night for last 7 years. It is best of both worlds: I save energy, I don't heat up my room in summer, and I get to restore my desktop workspaces just like they were the previous day. Native ZFS has broken that tradition for me. And I would never want to blame ZFS for anything...;-) I will dig into it though to see if I can come up with a patch for you. |
This feature can be very important for home NAS environments, too. These boxes are kept idling most of the time anyways, and S2R/hibernation can save a significant amount of power (about 15W with my setup). I encourage implementing this, maybe this is the missing link to get suspend-to-RAM fully working on my Zotac Fusion NAS :) |
To easily test freeze/thaw, we could use xfs_freeze (from xfsprogs). It is documented to work on other FSes, too. Currently, of course, for ZFS, it reports that it is unable to do so. |
So, would this do?
into with
What about returned error codes? Are they compatible? |
That's going to be the jist of it. However, the devils in the details and that's why this isn't a trivial change. The questions your asking are the right one but someone needs to sit down and read through the code to get the right answers. A few things to be careful of.
|
Some updates on this feature in my branch: https://github.com/kohlschuetter/zfs/commits/freeze (see kohlschuetter@f9e8ae5 ) Freeze/unfreeze seems to work with Linux >= 2.6.35 and silently fails with 2.6.32/RHEL6. I haven't tried it with earlier kernels, though. Before 2.6.35, freeze requires a block device set in the superblock, which zfs does not provide. The RHEL6 kernel can be patched easily by back-porting a few changes. Having a compatible kernel, freezing/unfreezing seems to work with I am not sure about the expected behavior here. Changes on freeze behavior are in fact outside the scope of this patch; they should probably be performed at ZFS suspend/resume level. |
I actually think freeze/thaw is more important for backup scenarios and when underlaying storage have own snapshot/cloning mechanisms (like iSCSI, or LVM on local or remote machine, or snapshots of zVOL over iSCSI, etc). freeze will make sure underlaying devices are in consistent state, that all direct / synchronized data are in fact pushed into devices, and block all process of further writes to the whole filesystem (this constrain could be relaxed, until we have enough memory, and doesn't performed fsync/fdatasync/create/close/unlink/rename, etc. - they should block only if actuall write IO would be needed to be performed.). After sucesfully freezing file system, one can create safely snapshot / clone on the storage (LVM snapshot, zVOL snapshot, netapp snapshot), then unfreeze zfs, and use snapshot for something (like dump it into tape streamer or other machine). |
@baryluk: Yes, after more carefully reviewing the kernel code your exactly right. In fact supporting freeze/thaw appears to be only useful if you want to support md/lvm style snapshots under the zfs vdev layer. That's something we probably don't care about. This support isn't needed for proper suspend/resume behavior. In fact, upon further inspection the filesystem doesn't really need to to anything to support this. So my question is... in practice why doesn't it work today? |
So we actually have two problems now:
|
Has there been any progress on this issue in the last 3 years? |
@paulhandy this hasn't been a priority for any of the developers. In part this has been because the interfaces provided for freeze/thaw by the Linux kernel have been horrible until fairly recently. I don't think this would be a ton of work if people were interested in working on it. |
Since I started using ZFS I made a custom script for the pm-sleep, which does an export of the pool before the system enters sleep mode. So I guess there is no better way to do it? |
@cyberius0 so you basically log out of X and run the script to initiate suspend ? been wondering how to do this is if /home is on a zpool - but there's probably only this way |
Sorry, I didn't read the whole thread, my /home isn't on a zpool. The zpool is mounted in /RAID. |
@kohlschuetter and @behlendorf , the above implementation has two issues:
|
@ccic thanks for taking the time to investigate how this functionality could be implemented. The changes proposed here should be considered an initial prototype WIP to help us investigate the various issues. Unfortunately, adding this functionality hasn't been a hasn't priority for the team. Regarding you specific concerns.
Good point. So one possible avenue worth exploring might be to have a freeze take a snapshot and then use the existing rollback code to effectively pivot on to that snapshot. That would allow us to use the existing model of suspend/rollback/resume except that you'd be resuming on an immutable snapshot.
Using a snapshot would provide a solid guarantee on immutability. As for allowing the pool to be manipulated or other non-frozen filesystems it's not clear that's a problem. The VFS is only requesting that a specific super block be frozen. If freezing an entire pool is needed than alternate interfaces will be needed. |
@behlendorf thanks for your sharing of your consideration. I know this feature is not a priority. I just want to get some clues about how to design and implement it. |
@ccic yes, it should have that effect. |
+1 For example on AWS one can perform a point-in-time snapshot of an attached EBS drive. Some backup tools rely on FS flushing and freezing so that the snapshot data is consistent. For example with xfs_freeze we are able to snapshot a raid array with no consistency issues. An example of this is the mysql-ebs-backup script that's currently tailored for XFS on EBS: https://github.com/graphiq/mysql-ebs-snapshot. If anyone knows for a workaround (i.e. sync command perhaps)? Please do share. |
A quick workaround for one ZFS FS may be: For a complete feature to freeze zpool, we have to (1) flush dirty pages and (2) suspend write to the pool. But now, it prevents from being written to disk, unfortunately it excludes synchronous write. It still needs more investigation. |
Does /bin/sync flush dirty ZFS pages to disk? |
ZFS has no control over this. |
also, even if the distribution disables hibernate on ZFS, people just go out of their way to avoid all warnings and do whatever they want: https://askubuntu.com/questions/1266599/how-to-increase-swap-with-zfs-on-ubuntu-20-04-to-enable-hibernation |
@bghira It doesn't seem believable to me that ZFS has no control over this. Surely there are callbacks that can be registered with the kernel to be invoked when hibernate is invoked. At the very least these could be implemented to hang or kernel panic instead of allowing hibernate to proceed and silently corrupt the zpool. |
as i understand, they're GPL-only symbols. |
After digging into the kernel hibernation (suspend to disk) process with @problame, we figured out that ZFS (event with hibernate) did not cause the zpool corruption. Further during our (rough) analyze, we did not found a reason why ZFS wouldn't work with hibernation iff the swap is outside of ZFS. Even without the TL;DR: The problem was in Thanks @problame for your support. |
Well I can add to that. I'm hibernating regularly since a couple of years without any problem so far (knock on wood). I've root on zfs but boot and swap on luks on top of mdadm mirrors. I'm using mkinitcpio if that matters. |
Maybe ZFS should refuse to mount read-write without the user forcing it if it believes it was mounted at hibernation? (I'm assuming a read-only mount won't change anything on disk...) |
if you can help point out which docs might need updating to include these hints then we might make some progress on it i know there's a few install guides that are hosted by the OpenZFS project that could be enhanced. each one could link to a page of caveats, which would mean just one spot to maintain them. i would suggest that it be added to the man pages but ever since they were 'modernised' by breaking them out into separate pages, i have found them to be less useful and rarely search them for information that's now easier to find on Google. |
@danielmorlock @problame wow, thanks for that hint on the initramfs scripts issue! I use Arch Linux and think that I'm in the same situation with the scripts importing the zpool and then decrypting swap on resume. I store the swap partition's key in the zpool itself so on resume it does the following:
I'm assuming that step 2 is the issue since the state of the zpool on disk could then differ from the in-memory state in swap that we resume from. Would this work if the pool is imported read-only in step 2? |
@eblau yeah, that sound unhealthy.
Regardless, I think -f shouldn't be sufficient to import a pool that was hibernated. Hibernation workflow:
Resume workflow:
To prevent accidental imports, we extend Thoughts on this design, @behlendorf ? |
My initcpio script is not using
|
Im pretty sure, that
My kernel command line was:
In the initrd phase, the zpool is tried to import before resuming from suspend to disk. So this is equal to import an already imported zpool from a different kernel. Is it? Does ZFS track that it is already imported (from another system)? |
@behlendorf would be best to answer but from past discussions i recall him saying this would need to be very carefully handled (and might not be possible to, since many symbols are GPL-only surrounding the hibernation code, possibly why nvidia's implementation sucks as well) as it could lead to undefined behaviour and crashes, as the ZFS code is currently written. |
That design sounds reasonable to me.
Yes, definitively. Once I wrecked a pool beyond repair by accidentally resuming. I wrote to it between hibernation and resume from a different system and after the resume things went south. But even an import/export cycle alone will change the pool so it won't match the state which is stored in the hibernation image.
IIRC there was an issue with read only import modifying some state of the pool. Not sure what the current situation is.
Yes.
Not by default, MMP (Multi-modifier protection aka multihost) handles that but I can't tell if it would work in this case. |
@eblau If so, why not use something like
Also - in a hibernation sense - wouldn't it make more sense to decrypt the swap first before the root? |
@Greek64 I do it this way due to ignorance. :) I researched LUKS on Arch Linux wiki pages and implemented that using the 2-stage unlock approach and then added hibernate/resume later without recognizing the bad interaction between the two. Definitely it makes more sense to decrypt the swap before the root. That's why when I saw the explanation from @danielmorlock and @problame, I immediately realized the error of my ways. Sorry for troubling the ZFS folks with this issue. The subject of this issue made me think that it was some ZFS support missing. Crazy thing is that I hibernated for like 2 years every day using this approach and only hit zpool corruption like 3 times. Luckily I take backups religiously and never lost much due to the magic of zfs send/receive. |
I've opened a bug ticket for genkernel: https://bugs.gentoo.org/827281 |
We should close this issue since @behlendorf 's original motivation is misleading people into believing freeze & thaw are related to or required for supporting hibernation. Barring some uncertainty from my side about in-flight IOs, my current understanding is that it's safe to hibernate a system with an imported pool if and only if the swap space into which the hibernation image is written resides outside of the zpool. A few words on Note that there are also the super block ops Note also: btrfs. It's the mainline filesystem most similar to ZFS with regard to pooled storage (multiple blockdevs!) and multiple super blocks on top. Btrfs implements |
Maybe one more remark regarding just how brittle hibernation currently is: What does this mean?
|
@problame your general design makes good sense to me. It's been a while since I looked at this (years), but I agree it should largely be a matter of suspending the txg engine when hibernating and then resuming it after the system state has been restored.
One avenue you may want to explore here is leveraging the existing Opening a new issue to track this work sounds like a good idea to me. I'd just suggest that we somehow reference this old issue from the new one to make it easy to find. I don't mind closing this one at all once we have a replacement to continue the discussion. |
@behlendorf I have created two follow-up issues:
I think you can close this issue now. |
…d be tunable (openzfs#260) Default value remains 100, but this will allow easily increasing it for testing.
possible related to #13879 |
NAS-132425 / None / sync truenas/zfs-2.3-release with upstream 2.3.0-rc3 tag
ZFS has hooks for suspending the filesystem but they have not yet been integrated with their Linux freeze/thaw counterparts. This must be done before you can safely hibernate a system running ZFS.
The text was updated successfully, but these errors were encountered: