umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

c0xc · 2022-04-13T15:41:30Z

I'm observing a situation with ZFS processes stuck, causing the load to grow in the 5 digits. They are stuck in the kernel and therefore not killable. I'm wondering why this could be and if this could be fixed without rebooting the server?

root      279580  279503  0 Feb04 ?        00:00:00 bash /root/bin/zfs-snapshot z-bod/DUMP hourly 72
root      279599  279580  0 Feb04 ?        00:00:00 /sbin/zfs destroy -r z-bod/[email protected]
root     3117486 3115126  0 Feb04 ?        00:00:00 umount -t zfs -n /z-main/Share/.zfs/snapshot/weekly.4
root     3115126       2  0 Feb04 ?        00:00:00 [kworker/u113:4+events_unbound]

zfs-snapshot is a snapshot rotation script. There are tens of thousands of zfs processes like this but only 55 "umount" processes. Other processes like CROND are also accumulating (10k).

Could this be an issue with ZFS? Assuming some of those ZFS processes are causing the others to get stuck, how can they be terminated?

This is ZFS 2.1.0-1, currently running on Fedora 32, kernel 5.11.2.

At first glance, issue #10100 appears to be similar, but in this case it's not causing soft lockup errors. It seems to be somehow related to cifs and/or nfs exports (there are smbd processes from the same day). Now, running ls, lsof or even bash auto-complete on (some older) snapshots will get stuck as well.

The text was updated successfully, but these errors were encountered:

rincebrain · 2022-04-14T17:40:08Z

I would suggest trying 2.1.4 and seeing if the issue persists, initially - there have been a number of bugs fixed since 2.1.0 released, and while I can't think offhand of any that would have caused this, it's always unfortunate to spend a long time figuring out your problem only to realize someone already resolved it.

More generally, if you're not seeing any "task blocked more than 120s" messages in dmesg, unless someone turned them off, that implies that things are making progress, but so slowly that it doesn't really look like it. It'd be interesting to know where your ZFS kernel threads are spending their time - e.g. if you look at /proc/[one of the stuck processes]/stack for the different types of stuck process (zfs commands, zpool commands, ls on a dir, etc), what does it say? What does, say, perf top say you're spending time on, assuming that there is CPU load and not solely state D as far as the eye can see.

Something like klockstat or offcputime from the BCC toolkit would probably be informative.

c0xc · 2022-04-14T20:49:44Z

Thanks for your comment. I'm planning on upgrading (reluctantly, as past Fedora release upgrades broke the ZFS installation every time).

I cannot provide any more debugging info as the situation got worse, more services became unresponsive, user sessions were not usable anymore. A reboot was required (as expected, those processes prevented the system from shutting down, so a cold power cycle was necessary). I do not believe that any progress was made, why would there still be processes that have been stuck for over a month. I think one of the many zfs processes was stuck at some zpl snapdir function, don't remember the exact name.

I wish I could go back and find out what happened.

szubersk · 2022-04-15T12:42:44Z

A couple of wild shots:

$ sysctl kernel.hung_task_timeout_secs=10 # this should report in `dmesg` hung threads after 10 seconds of CPU deprivation
$ iostat -x # to confirm if there's I/O ongoing

umount -t zfs -n /z-main/Share/.zfs/snapshot/weekly.4 looks interesting. Maybe it messes with the invisible .zfs directory causing irrational OpenZFS behavior?

c0xc · 2022-04-15T16:24:12Z

I'm thinking about configuring kernel.hung_task_timeout_secs=10. It would have filled dmesg a long time ago... :)

As for iostat: I have actually checked that before rebooting. It looked normal. There was a bit of activity every now and then, but not a single drive was showing high activity, not one of them was stuck, they all showed 0 activity at some point.

Yes, these umount processes are indeed very suspicious. They were obviously automatically initiated by ZFS, but why... The start time of those processes correlated with a kernel message about the NFS server and with the start time of the SAMBA server, but I can't say if those were restarted automatically or not. But again, the other strange aspect is that those processes had been stuck for more than a month before I saw symptoms of things getting stuck (though I've to admit it could be that I haven't accessed those ~55 to-be-unmounted snapshots). Before I had to make an emergency reboot yesterday, pretty much all kinds of other ZFS-related things got stuck, things that used to work fine until recently.

The only special thing that happened recently was a zfs send test recently (of a very small dataset). Now, a wild theory might be that there were two unrelated problems: One, those ~55 frozen umount processes (which would've eventually exhausted proc and fd limits but did not immediately cause anything to get stuck) and two, the zfs send command caused things to freeze within a matter of days. Are there any known bugs of zfs send, which could cause something like this?

At first glance, the issue #4716 sounds a bit similar as it also mentions accessing snapshots and a subsequent freeze, but it's old and probably unrelated.

szubersk · 2022-04-15T17:20:24Z

I'd still suggest kernel.hung_task_timeout_secs=10, at least for a couple of minutes to gather debug info from kernel ring. It would help us nail the root cause down by showing what exactly hung.

The closest I could think of is using umount user space tool right before destroying the snapshot. It hangs for an unknown, as of now, reason.

zfs/module/os/linux/zfs/zfs_ctldir.c

Lines 1015 to 1025 in 0dd34a1

    
           /* 
        
            * Attempt to unmount a snapshot by making a call to user space. 
        
            * There is no assurance that this can or will succeed, is just a 
        
            * best effort.  In the case where it does fail, perhaps because 
        
            * it's in use, the unmount will fail harmlessly. 
        
            */ 
        
           int 
        
           zfsctl_snapshot_unmount(const char *snapname, int flags) 
        
           { 
        
           	char *argv[] = { "/usr/bin/env", "umount", "-t", "zfs", "-n", NULL, 
        
           	    NULL };

If you feel like experimenting you could try injecting -l option to unmount(8) invocation in the code above.

       -l, --lazy
           Lazy unmount. Detach the filesystem from the file hierarchy now, and clean up all references to this filesystem as
           soon as it is not busy anymore.

           A system reboot would be expected in near future if you’re going to use this option for network filesystem or local
           filesystem with submounts. The recommended use-case for umount -l is to prevent hangs on shutdown due to an
           unreachable network share where a normal umount will hang due to a downed server or a network partition. Remounts of
           the share will not be possible.

Additionally, investigating the content of /root/bin/zfs-snapshot would give us more data to work with.

Please have backups!

c0xc · 2022-04-17T17:50:23Z

Thanks for your ideas.
I could certainly change the snapshot rotation script to check if the snapshot is mounted and if so, unmount it before deleting it. But I think that would merely be a workaround to avoid an issue which I still don't know when and why it occurs. The system is not new, it's been working fine for a long time and now this happened.

The snapshot rotation script is very simple and you can find it here:
https://github.com/c0xc/zfs-snapshot/blob/master/zfs-snapshot.sh

I don't want to lazy unmount snapshots because I assume, it would freeze in the same way, but I wouldn't even see the umount process, so (if it happens again) I wouldn't have any clue that some snapshots might have something to do with whatever is going on.

rincebrain · 2022-06-16T23:39:58Z

You may find #13131 (comment) and my prior reply about how I revised my original patch to help with unmounting snapshots sometimes tripping an assertion failure on debug builds interesting, though as I comment there, these are just patches I'm experimenting with to fix the problem in my own systems, I make no promises they don't burn things down other than that if they do, I'll likely be burning down too.

stale · 2023-06-17T23:37:50Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

devZer0 · 2023-08-14T00:23:22Z

activity

Tabiskabis · 2023-08-18T10:26:32Z

same problem

rincebrain · 2023-08-18T10:52:09Z

On what version?

The eventual revisions of that patch got merged into 2.1, so on 2.1.12, if you're still having this issue, that'd be exciting.

Tabiskabis · 2023-08-18T11:04:28Z

it's zfs-2.1.11

rincebrain · 2023-08-18T11:09:24Z

Huh, did I never get #14462 cherrypicked into 2.1? Ruh-roh.

Tabiskabis · 2023-08-18T13:33:16Z

Uh, it's some sort of custom kernel. Probably gonna take a long time to the next update. I'll try to keep this in mind, though,
and post an update if the issue occurs again in the next version.

Probably irrelevant, but interesting coincidence: it happened during zfs scrubbing. And the un/mounts were almost certainly not initiated by sanoid snapshot timer, but by some systemd autoaction that i know too little about.

rincebrain · 2023-08-18T13:53:41Z

The unmounts are a periodic timer in ZFS itself, usually, which is what those patches fix - there were cases where it would fail and just give up, essentially.

yohaya · 2024-12-01T04:15:52Z

workaround to avoid this built-in zfs zfs_expire_snapshot parameter:

echo 0 > /sys/module/zfs/parameters/zfs_expire_snapshot

It will disable the automatic umount of the snapshots

stale bot added the Status: Stale label Jun 17, 2023

stale bot removed the Status: Stale label Aug 14, 2023

rincebrain mentioned this issue Aug 18, 2023

Cherrypick 2 snapshot fixes for 2.1.13 #15187

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

c0xc commented Apr 13, 2022 •

edited

Loading

rincebrain commented Apr 14, 2022

c0xc commented Apr 14, 2022

szubersk commented Apr 15, 2022

c0xc commented Apr 15, 2022

szubersk commented Apr 15, 2022 •

edited

Loading

c0xc commented Apr 17, 2022

rincebrain commented Jun 16, 2022

stale bot commented Jun 17, 2023

devZer0 commented Aug 14, 2023

Tabiskabis commented Aug 18, 2023

rincebrain commented Aug 18, 2023

Tabiskabis commented Aug 18, 2023

rincebrain commented Aug 18, 2023

Tabiskabis commented Aug 18, 2023

rincebrain commented Aug 18, 2023

yohaya commented Dec 1, 2024 •

edited

Loading

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

Comments

c0xc commented Apr 13, 2022 • edited Loading

rincebrain commented Apr 14, 2022

c0xc commented Apr 14, 2022

szubersk commented Apr 15, 2022

c0xc commented Apr 15, 2022

szubersk commented Apr 15, 2022 • edited Loading

c0xc commented Apr 17, 2022

rincebrain commented Jun 16, 2022

stale bot commented Jun 17, 2023

devZer0 commented Aug 14, 2023

Tabiskabis commented Aug 18, 2023

rincebrain commented Aug 18, 2023

Tabiskabis commented Aug 18, 2023

rincebrain commented Aug 18, 2023

Tabiskabis commented Aug 18, 2023

rincebrain commented Aug 18, 2023

yohaya commented Dec 1, 2024 • edited Loading

c0xc commented Apr 13, 2022 •

edited

Loading

szubersk commented Apr 15, 2022 •

edited

Loading

yohaya commented Dec 1, 2024 •

edited

Loading