Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

Open
c0xc opened this issue Apr 13, 2022 · 16 comments
Open

umount, snapshot ZFS processes stuck in kernel forever causing high load #13327

c0xc opened this issue Apr 13, 2022 · 16 comments

Comments

@c0xc
Copy link

c0xc commented Apr 13, 2022

I'm observing a situation with ZFS processes stuck, causing the load to grow in the 5 digits. They are stuck in the kernel and therefore not killable. I'm wondering why this could be and if this could be fixed without rebooting the server?

root      279580  279503  0 Feb04 ?        00:00:00 bash /root/bin/zfs-snapshot z-bod/DUMP hourly 72
root      279599  279580  0 Feb04 ?        00:00:00 /sbin/zfs destroy -r z-bod/[email protected]
root     3117486 3115126  0 Feb04 ?        00:00:00 umount -t zfs -n /z-main/Share/.zfs/snapshot/weekly.4
root     3115126       2  0 Feb04 ?        00:00:00 [kworker/u113:4+events_unbound]

zfs-snapshot is a snapshot rotation script. There are tens of thousands of zfs processes like this but only 55 "umount" processes. Other processes like CROND are also accumulating (10k).

Could this be an issue with ZFS? Assuming some of those ZFS processes are causing the others to get stuck, how can they be terminated?

This is ZFS 2.1.0-1, currently running on Fedora 32, kernel 5.11.2.

At first glance, issue #10100 appears to be similar, but in this case it's not causing soft lockup errors. It seems to be somehow related to cifs and/or nfs exports (there are smbd processes from the same day). Now, running ls, lsof or even bash auto-complete on (some older) snapshots will get stuck as well.

@rincebrain
Copy link
Contributor

I would suggest trying 2.1.4 and seeing if the issue persists, initially - there have been a number of bugs fixed since 2.1.0 released, and while I can't think offhand of any that would have caused this, it's always unfortunate to spend a long time figuring out your problem only to realize someone already resolved it.

More generally, if you're not seeing any "task blocked more than 120s" messages in dmesg, unless someone turned them off, that implies that things are making progress, but so slowly that it doesn't really look like it. It'd be interesting to know where your ZFS kernel threads are spending their time - e.g. if you look at /proc/[one of the stuck processes]/stack for the different types of stuck process (zfs commands, zpool commands, ls on a dir, etc), what does it say? What does, say, perf top say you're spending time on, assuming that there is CPU load and not solely state D as far as the eye can see.

Something like klockstat or offcputime from the BCC toolkit would probably be informative.

@c0xc
Copy link
Author

c0xc commented Apr 14, 2022

Thanks for your comment. I'm planning on upgrading (reluctantly, as past Fedora release upgrades broke the ZFS installation every time).

I cannot provide any more debugging info as the situation got worse, more services became unresponsive, user sessions were not usable anymore. A reboot was required (as expected, those processes prevented the system from shutting down, so a cold power cycle was necessary). I do not believe that any progress was made, why would there still be processes that have been stuck for over a month. I think one of the many zfs processes was stuck at some zpl snapdir function, don't remember the exact name.

I wish I could go back and find out what happened.

@szubersk
Copy link
Contributor

A couple of wild shots:

$ sysctl kernel.hung_task_timeout_secs=10 # this should report in `dmesg` hung threads after 10 seconds of CPU deprivation
$ iostat -x # to confirm if there's I/O ongoing

umount -t zfs -n /z-main/Share/.zfs/snapshot/weekly.4 looks interesting. Maybe it messes with the invisible .zfs directory causing irrational OpenZFS behavior?

@c0xc
Copy link
Author

c0xc commented Apr 15, 2022

I'm thinking about configuring kernel.hung_task_timeout_secs=10. It would have filled dmesg a long time ago... :)

As for iostat: I have actually checked that before rebooting. It looked normal. There was a bit of activity every now and then, but not a single drive was showing high activity, not one of them was stuck, they all showed 0 activity at some point.

Yes, these umount processes are indeed very suspicious. They were obviously automatically initiated by ZFS, but why... The start time of those processes correlated with a kernel message about the NFS server and with the start time of the SAMBA server, but I can't say if those were restarted automatically or not. But again, the other strange aspect is that those processes had been stuck for more than a month before I saw symptoms of things getting stuck (though I've to admit it could be that I haven't accessed those ~55 to-be-unmounted snapshots). Before I had to make an emergency reboot yesterday, pretty much all kinds of other ZFS-related things got stuck, things that used to work fine until recently.

The only special thing that happened recently was a zfs send test recently (of a very small dataset). Now, a wild theory might be that there were two unrelated problems: One, those ~55 frozen umount processes (which would've eventually exhausted proc and fd limits but did not immediately cause anything to get stuck) and two, the zfs send command caused things to freeze within a matter of days. Are there any known bugs of zfs send, which could cause something like this?

At first glance, the issue #4716 sounds a bit similar as it also mentions accessing snapshots and a subsequent freeze, but it's old and probably unrelated.

@szubersk
Copy link
Contributor

szubersk commented Apr 15, 2022

I'd still suggest kernel.hung_task_timeout_secs=10, at least for a couple of minutes to gather debug info from kernel ring. It would help us nail the root cause down by showing what exactly hung.

The closest I could think of is using umount user space tool right before destroying the snapshot. It hangs for an unknown, as of now, reason.

/*
* Attempt to unmount a snapshot by making a call to user space.
* There is no assurance that this can or will succeed, is just a
* best effort. In the case where it does fail, perhaps because
* it's in use, the unmount will fail harmlessly.
*/
int
zfsctl_snapshot_unmount(const char *snapname, int flags)
{
char *argv[] = { "/usr/bin/env", "umount", "-t", "zfs", "-n", NULL,
NULL };

If you feel like experimenting you could try injecting -l option to unmount(8) invocation in the code above.

       -l, --lazy
           Lazy unmount. Detach the filesystem from the file hierarchy now, and clean up all references to this filesystem as
           soon as it is not busy anymore.

           A system reboot would be expected in near future if you’re going to use this option for network filesystem or local
           filesystem with submounts. The recommended use-case for umount -l is to prevent hangs on shutdown due to an
           unreachable network share where a normal umount will hang due to a downed server or a network partition. Remounts of
           the share will not be possible.

Additionally, investigating the content of /root/bin/zfs-snapshot would give us more data to work with.

Please have backups!

@c0xc
Copy link
Author

c0xc commented Apr 17, 2022

Thanks for your ideas.
I could certainly change the snapshot rotation script to check if the snapshot is mounted and if so, unmount it before deleting it. But I think that would merely be a workaround to avoid an issue which I still don't know when and why it occurs. The system is not new, it's been working fine for a long time and now this happened.

The snapshot rotation script is very simple and you can find it here:
https://github.com/c0xc/zfs-snapshot/blob/master/zfs-snapshot.sh

I don't want to lazy unmount snapshots because I assume, it would freeze in the same way, but I wouldn't even see the umount process, so (if it happens again) I wouldn't have any clue that some snapshots might have something to do with whatever is going on.

@rincebrain
Copy link
Contributor

You may find #13131 (comment) and my prior reply about how I revised my original patch to help with unmounting snapshots sometimes tripping an assertion failure on debug builds interesting, though as I comment there, these are just patches I'm experimenting with to fix the problem in my own systems, I make no promises they don't burn things down other than that if they do, I'll likely be burning down too.

@stale
Copy link

stale bot commented Jun 17, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jun 17, 2023
@devZer0
Copy link

devZer0 commented Aug 14, 2023

activity

@stale stale bot removed the Status: Stale No recent activity for issue label Aug 14, 2023
@Tabiskabis
Copy link

same problem

@rincebrain
Copy link
Contributor

On what version?

The eventual revisions of that patch got merged into 2.1, so on 2.1.12, if you're still having this issue, that'd be exciting.

@Tabiskabis
Copy link

it's zfs-2.1.11

@rincebrain
Copy link
Contributor

Huh, did I never get #14462 cherrypicked into 2.1? Ruh-roh.

@Tabiskabis
Copy link

Uh, it's some sort of custom kernel. Probably gonna take a long time to the next update. I'll try to keep this in mind, though,
and post an update if the issue occurs again in the next version.

Probably irrelevant, but interesting coincidence: it happened during zfs scrubbing. And the un/mounts were almost certainly not initiated by sanoid snapshot timer, but by some systemd autoaction that i know too little about.

@rincebrain
Copy link
Contributor

The unmounts are a periodic timer in ZFS itself, usually, which is what those patches fix - there were cases where it would fail and just give up, essentially.

@yohaya
Copy link

yohaya commented Dec 1, 2024

workaround to avoid this built-in zfs zfs_expire_snapshot parameter:

echo 0 > /sys/module/zfs/parameters/zfs_expire_snapshot

It will disable the automatic umount of the snapshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants