Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS Kernel Freeze BUG: Dentry still in use ... (1) [unmount of zfs zfs] #5810

Closed
dbakken opened this issue Feb 21, 2017 · 12 comments
Closed

Comments

@dbakken
Copy link

dbakken commented Feb 21, 2017

System information

Distribution Name | Debian
Distribution Version | Jessie 8.7
Linux Kernel | 3.16.0-4
Architecture | amd64
ZFS Version | 0.6.5.8
SPL Version | 0.6.5.8

Server: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 2.0a
CPU: 2x Xeon E5-2695 v4 2.1GHz (18cores/36threads)
Memory: 512GB Registered ECC DDR4 2133MHz
Disks: 24x Samsung Enterprise SSD MZ7LM3T8 (4TB)
HBA: 3x LSI SAS3008 FWVersion(12.00.02.00), ChipRevision(0x02), BiosVersion(08.29.01.00)

Describe the problem you're observing

Our zfs-backed nfs server experiences kernel lockups under heavy load after upgrading to zfs-dkms 0.6.5.8 from Debian backports. The kernel freezes and the server must be power cycled.

The server was previously stable for months running zfs-dkms 0.6.5.7 from archive.zfsonlinux.org. We have reverted to 0.6.5.7 until this bug is resolved, since 0.6.5.8 is too unstable and crashes frequently.

Describe how to reproduce the problem

Export a zfs filesystem with the options "rw,async,no_subtree_check,no_root_squash,crossmnt" in /etc/exports. Access the filesystem and snapshot dir from multiple nfs clients. This behavior is similar to #3794 and #4716.

Include any warning/errors/backtraces from the system logs

Syslog excerpt:

Feb 17 20:55:49 basan kernel: [193983.962465] BUG: Dentry ffff881bef497318{i=28d05,n=dim_county.sqlite3}  still in use (1) [unmount of zfs zfs]
Feb 17 20:55:49 basan kernel: [193983.962540] ------------[ cut here ]------------
Feb 17 20:55:49 basan kernel: [193983.962551] WARNING: CPU: 59 PID: 1970725 at /build/linux-GU1w8g/linux-3.16.39/fs/dcache.c:1347 umount_check+0x74/0x80()
Feb 17 20:55:49 basan kernel: [193983.962554] Modules linked in: nls_utf8 ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs libcrc32c crc32c_generic dm_mod veth xt_nat xt_tcpudp xt_addrtype xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs(C) binfmt_misc nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding iTCO_wdt iTCO_vendor_support mxm_wmi x86_pkg_temp_thermal coretemp kvm_intel zfs(PO) kvm crc32_pclmul zunicode(PO) zcommon(PO) aesni_intel znvpair(PO) aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd spl(O) zavl(PO) pcspkr evdev joydev ast ttm drm_kms_helper drm i2c_i801 i2c_algo_bit i2c_core lpc_ich mei_me mfd_core shpchp mei tpm_tis tpm loop ipmi_watchdog wmi acpi_power_meter acpi_pad processor thermal_sys button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse ecryptfs autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq raid1 md_mod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci mpt3sas libahci raid_class scsi_transport_sas ehci_pci libata xhci_hcd ehci_hcd ixgbe dca ptp usbcore scsi_mod pps_core usb_common mdio
Feb 17 20:55:49 basan kernel: [193983.962680] CPU: 59 PID: 1970725 Comm: umount Tainted: P         C O  3.16.0-4-amd64 #1 Debian 3.16.39-1
Feb 17 20:55:49 basan kernel: [193983.962682] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 2.0a 06/30/2016
Feb 17 20:55:49 basan kernel: [193983.962684]  0000000000000000 ffffffff81514c11 0000000000000000 0000000000000009
Feb 17 20:55:49 basan kernel: [193983.962688]  ffffffff81068867 ffff881bef497318 ffff88028e91aeb0 ffff881bef4973a8
Feb 17 20:55:49 basan kernel: [193983.962692]  ffff881bef497370 ffff88028e91aef8 ffffffff811bfcd4 ffff887b1eebaad8
Feb 17 20:55:49 basan kernel: [193983.962695] Call Trace:
Feb 17 20:55:49 basan kernel: [193983.962703]  [<ffffffff81514c11>] ? dump_stack+0x5d/0x78
Feb 17 20:55:49 basan kernel: [193983.962713]  [<ffffffff81068867>] ? warn_slowpath_common+0x77/0x90
Feb 17 20:55:49 basan kernel: [193983.962717]  [<ffffffff811bfcd4>] ? umount_check+0x74/0x80
Feb 17 20:55:49 basan kernel: [193983.962721]  [<ffffffff811c0e29>] ? d_walk+0xf9/0x2d0
Feb 17 20:55:49 basan kernel: [193983.962725]  [<ffffffff811bfc60>] ? d_lru_del+0xa0/0xa0
Feb 17 20:55:49 basan kernel: [193983.962731]  [<ffffffff811c1162>] ? do_one_tree+0x22/0x40
Feb 17 20:55:49 basan kernel: [193983.962735]  [<ffffffff811c2198>] ? shrink_dcache_for_umount+0x28/0x80
Feb 17 20:55:49 basan kernel: [193983.962743]  [<ffffffff811ac83c>] ? generic_shutdown_super+0x1c/0xf0
Feb 17 20:55:49 basan kernel: [193983.962748]  [<ffffffff811acb6e>] ? kill_anon_super+0xe/0x20
Feb 17 20:55:49 basan kernel: [193983.962751]  [<ffffffff811ace9a>] ? deactivate_locked_super+0x3a/0x50
Feb 17 20:55:49 basan kernel: [193983.962760]  [<ffffffff811c8c85>] ? mntput_no_expire+0xc5/0x150
Feb 17 20:55:49 basan kernel: [193983.962764]  [<ffffffff811c9e9a>] ? SyS_umount+0x8a/0x120
Feb 17 20:55:49 basan kernel: [193983.962772]  [<ffffffff8151adcd>] ? system_call_fast_compare_end+0x10/0x15
Feb 17 20:55:49 basan kernel: [193983.962775] ---[ end trace 60152c0418085aab ]---

Full syslog:
zfs_20170217.txt

@tuxoko tuxoko mentioned this issue Feb 25, 2017
11 tasks
@tuxoko
Copy link
Contributor

tuxoko commented Mar 1, 2017

@dbakken
Can you try the patch in #5833
Thanks

@dbakken
Copy link
Author

dbakken commented Mar 2, 2017

I'm willing to test if you can provide instructions how to compile and install the patched kernel modules. After the crash I downgraded zfs-dkms to v0.6.5.7, and the server hasn't crashed after 8 days. However, NFS load has not been very high. It's a difficult bug to reproduce, but I'm happy to compile and install your patched code on a less critical server for functionality testing, and later schedule a reboot of our primary NFS server for testing.

Thanks for the quick response. The patch looks very promising!

@tuxoko
Copy link
Contributor

tuxoko commented Mar 2, 2017

If you're using dkms, you should be able to find the zfs source in /usr/src/zfs-0.6.5.9
Download the patch and apply it with cd /usr/src/zfs-0.6.5.9 && patch -p1 < patchfile
After that, remove the old modules with dkms remove -m zfs/0.6.5.9 -k kernel_version
Then, build and install the new modules with dkms install -m zfs/0.6.5.9 -k kernel_version
Finally, rmmod zfs then modprobe zfs to load new version.

@dbakken
Copy link
Author

dbakken commented Mar 2, 2017

Thanks. I compiled and installed the patched zfs module on another Debian server (which doesn't run NFS) and have not hit any problems yet. I have to schedule downtime to test it on our main NFS server. I will update here after that is done.

@dbakken
Copy link
Author

dbakken commented Mar 3, 2017

Our main NFS server is running zfsonlinux v0.6.5.9 with patch #5833. I'll update this issue after a couple weeks, or earlier if our server crashes.

@dbakken
Copy link
Author

dbakken commented Mar 6, 2017

The patched ZFS doesn't allow mounting snapshots over NFS. From an NFS client I can cd into the .zfs/snapshot dir, but the snapshot isn't auto-mounted and ls reports 0 files. NFS export options are rw,async,no_subtree_check,no_root_squash,crossmnt.

@tuxoko
Copy link
Contributor

tuxoko commented Mar 6, 2017

@dbakken
Did the snapshot get mounted on the server side?

@dbakken
Copy link
Author

dbakken commented Mar 6, 2017

I don't know. We restarted the server and are now running zfs-dkms v0.6.5.9 without the patch. Snapshot directories are auto-mounting over NFS again.

@tuxoko
Copy link
Contributor

tuxoko commented Mar 6, 2017

Can you help test it in a similar setting?

@dbakken
Copy link
Author

dbakken commented Mar 6, 2017

I tested with another server, and was able to auto-mount snapshots. Maybe I should have restarted the NFS server or client the first time? Unfortunately I'm unable to test the patch on our primary server as it needs to be online now.

@dbakken
Copy link
Author

dbakken commented Oct 5, 2017

Which versions of zfs on linux include this fix? Was it ever released in the 0.6.5.X branch?

@behlendorf
Copy link
Contributor

This fix was only applied to the 0.7 releases since it was a significant change in behavior and would have been disruptive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants