Nasty deadlock on zpool import #4006

mihaivint · 2015-11-11T19:18:36Z

I have a new issue today. One of the nodes locked. after an update. Not sure what triggered it, as others didn't have the same issue. Anyway on this node i had 0.6.5.1 now i've updated to 0.6.5.3 but behavior is the same zfs is not able to import the pool and hangs everything.
Environment is a VMware vm:
free -m
total used free shared buff/cache available
Mem: 7823 474 7050 8 298 5877
Swap: 4095 0 4095
cat /proc/cpuinfo |grep -ic process
4
uname -a
Linux f34bd24f63.es.private.redlight.hubgets.com 3.10.0-229.20.1.el7.x86_64 #1 SMP Tue Nov 3 19:10:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

  145.183846] CPU: 2 PID: 0 Comm: swapper/2 Tainted: PF          O--------------   3.10.0-229.20.1.el7.x86_64 #1
[  145.183847] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  145.183848] task: ffff8802341b5b00 ti: ffff880234200000 task.ti: ffff880234200000
[  145.183850] RIP: 0010:[<ffffffff81052caa>]  [<ffffffff81052caa>] native_write_msr_safe+0xa/0x10
[  145.183856] RSP: 0018:ffff88023fd03d80  EFLAGS: 00000046
[  145.183857] RAX: 0000000000000400 RBX: 0000000000000002 RCX: 0000000000000830
[  145.183858] RDX: 0000000000000002 RSI: 0000000000000400 RDI: 0000000000000830
[  145.183859] RBP: ffff88023fd03d80 R08: ffffffff81a21700 R09: 000000000000059c
[  145.183860] R10: 61206f7420494d4e R11: 3a73555043206c6c R12: ffffffff81a21700
[  145.183861] R13: 0000000000000002 R14: 000000000000a022 R15: 0000000000000002
[  145.183863] FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[  145.183864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.183865] CR2: 00007f39bf015f90 CR3: 00000002327b2000 CR4: 00000000001407e0
[  145.183893] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.183907] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  145.183907] Stack:
[  145.183908]  ffff88023fd03dd0 ffffffff8104a322 0000000000000086 0000000000000002
[  145.183911]  000800003fd03db0 0000000000002710 ffffffff819626c0 ffffffff819626c0
[  145.183913]  ffff88023fd0de00 ffffffff819627c0 ffff88023fd03de0 ffffffff8104a36c
[  145.183915] Call Trace:
[  145.183916]  <IRQ>

[  145.183921]  [<ffffffff8104a322>] __x2apic_send_IPI_mask+0xb2/0xe0
[  145.183924]  [<ffffffff8104a36c>] x2apic_send_IPI_all+0x1c/0x20
[  145.183928]  [<ffffffff81045e85>] arch_trigger_all_cpu_backtrace+0x65/0xa0
[  145.183931]  [<ffffffff81115d28>] rcu_check_callbacks+0x5a8/0x5f0
[  145.183934]  [<ffffffff81080f47>] update_process_times+0x47/0x80
[  145.183938]  [<ffffffff810d0525>] tick_sched_handle.isra.16+0x25/0x60
[  145.183939]  [<ffffffff810d05a1>] tick_sched_timer+0x41/0x60
[  145.183943]  [<ffffffff8109b1b7>] __run_hrtimer+0x77/0x1d0
[  145.183945]  [<ffffffff810d0560>] ? tick_sched_handle.isra.16+0x60/0x60
[  145.183948]  [<ffffffff8109b9f7>] hrtimer_interrupt+0xf7/0x240
[  145.183950]  [<ffffffff810441c7>] local_apic_timer_interrupt+0x37/0x60
[  145.183953]  [<ffffffff8161698f>] smp_apic_timer_interrupt+0x3f/0x60
[  145.183957]  [<ffffffff8161505d>] apic_timer_interrupt+0x6d/0x80
[  145.183958]  <EOI>

[  145.183960]  [<ffffffff8109b838>] ? hrtimer_start+0x18/0x20
[  145.183963]  [<ffffffff81052de6>] ? native_safe_halt+0x6/0x10
[  145.183966]  [<ffffffff8101c85f>] default_idle+0x1f/0xc0
[  145.183969]  [<ffffffff8101d166>] arch_cpu_idle+0x26/0x30
[  145.183972]  [<ffffffff810c6921>] cpu_startup_entry+0xf1/0x290
[  145.183974]  [<ffffffff8104228a>] start_secondary+0x1ba/0x230
[  145.183975] Code: 00 55 89 f9 48 89 e5 0f 32 31 c9 89 c0 48 c1 e2 20 89 0e 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48
[  145.183994] NMI backtrace for cpu 1
[  145.183999] CPU: 1 PID: 0 Comm: swapper/1 Tainted: PF          O--------------   3.10.0-229.20.1.el7.x86_64 #1
[  145.184001] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  145.184002] task: ffff8802341b4fa0 ti: ffff8802341fc000 task.ti: ffff8802341fc000
[  145.184004] RIP: 0010:[<ffffffff81052de6>]  [<ffffffff81052de6>] native_safe_halt+0x6/0x10
[  145.184008] RSP: 0018:ffff8802341ffe90  EFLAGS: 00000286
[  145.184010] RAX: 00000000ffffffed RBX: ffff8802341fffd8 RCX: 0100000000000000
[  145.184011] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
[  145.184012] RBP: ffff8802341ffe90 R08: 0000000000000000 R09: 0000000000000000
[  145.184013] R10: 0000000000000000 R11: ffff8800b8cabf30 R12: 0000000000000001
[  145.184013] R13: ffffffff81a21700 R14: 0000000000000000 R15: ffff8802341fffd8
[  145.184015] FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[  145.184016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.184017] CR2: 00007f39befa81c4 CR3: 00000002327b2000 CR4: 00000000001407e0
[  145.184036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.184050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  145.184051] Stack:
[  145.184052]  ffff8802341ffeb0 ffffffff8101c85f ffff8802341fffd8 0000000000000000
[  145.184054]  ffff8802341ffec0 ffffffff8101d166 ffff8802341fff20 ffffffff810c6921
[  145.184056]  ffff8802341fffd8 ffff8802341fffd8 ffff8802341fffd8 f1fa897016c5058b
[  145.184058] Call Trace:
[  145.184062]  [<ffffffff8101c85f>] default_idle+0x1f/0xc0
[  145.184064]  [<ffffffff8101d166>] arch_cpu_idle+0x26/0x30
[  145.184067]  [<ffffffff810c6921>] cpu_startup_entry+0xf1/0x290
[  145.184070]  [<ffffffff8104228a>] start_secondary+0x1ba/0x230
[  145.184071] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
[  145.184090] NMI backtrace for cpu 0
[  145.184095] CPU: 0 PID: 4241 Comm: mount.zfs Tainted: PF          O--------------   3.10.0-229.20.1.el7.x86_64 #1
[  145.184096] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  145.184098] task: ffff88022db3cfa0 ti: ffff88021a394000 task.ti: ffff88021a394000
[  145.184099] RIP: 0010:[<ffffffff8160b8ca>]  [<ffffffff8160b8ca>] _raw_spin_lock_irq+0x3a/0x60
[  145.184106] RSP: 0018:ffff88021a3975a8  EFLAGS: 00000097
[  145.184107] RAX: 00000000000060f2 RBX: ffff88022db3cfa0 RCX: 00000000000004dd
[  145.184108] RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff880231e7d0b8
[  145.184109] RBP: ffff88021a3975a8 R08: 0000000000000202 R09: 0000000000000000
[  145.184110] R10: ffffea0008c69000 R11: ffffffffa000a75a R12: ffff880231e7d0b0
[  145.184112] R13: ffffffffffffffff R14: ffff880231e7d0b8 R15: ffff880231e7d000
[  145.184114] FS:  00007f0561644780(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
[  145.184115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.184116] CR2: 00007fa6ede870a0 CR3: 0000000232113000 CR4: 00000000001407f0
[  145.184134] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.184148] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  145.184149] Stack:
[  145.184150]  ffff88021a397618 ffffffff8160b313 ffff8802321a9ea8 ffffffffa0181b40
[  145.184152]  ffff8802191ebbc0 ffff88022db3cfa0 ffff880200000001 ffff8802321a9ea8
[  145.184154]  000000003ffbca70 ffff88021a3976a8 ffff880231e7d0b0 0000000000000002
[  145.184156] Call Trace:
[  145.184160]  [<ffffffff8160b313>] rwsem_down_read_failed+0x53/0x165
[  145.184179]  [<ffffffff812e2e54>] call_rwsem_down_read_failed+0x14/0x30
[  145.184190]  [<ffffffffa000a75a>] ? spl_kmem_free+0x2a/0x40 [spl]
[  145.184192]  [<ffffffff81608c90>] ? down_read+0x20/0x30
[  145.184227]  [<ffffffffa013234c>] zap_get_leaf_byblk+0xec/0x2c0 [zfs]
[  145.184230]  [<ffffffff81607c72>] ? mutex_lock+0x12/0x2f
[  145.184253]  [<ffffffffa013259a>] zap_deref_leaf+0x7a/0xa0 [zfs]
[  145.184275]  [<ffffffffa01342fd>] fzap_cursor_retrieve+0x13d/0x2d0 [zfs]
[  145.184298]  [<ffffffffa0137ca4>] zap_cursor_retrieve+0x64/0x320 [zfs]
[  145.184302]  [<ffffffff8126df2a>] ? selinux_inode_alloc_security+0x3a/0xa0
[  145.184324]  [<ffffffffa0142323>] zfs_purgedir+0x93/0x220 [zfs]
[  145.184346]  [<ffffffffa010a7fc>] ? sa_lookup+0x9c/0xc0 [zfs]
[  145.184370]  [<ffffffffa01608f4>] ? zfs_inode_update+0x1a4/0x1b0 [zfs]
[  145.184373]  [<ffffffff81098235>] ? wake_up_bit+0x25/0x30
[  145.184377]  [<ffffffff811e1290>] ? unlock_new_inode+0x50/0x70
[  145.184398]  [<ffffffffa0160d37>] ? zfs_znode_alloc+0x437/0x550 [zfs]
[  145.184420]  [<ffffffffa0142722>] zfs_rmnode+0x272/0x350 [zfs]
[  145.184423]  [<ffffffff81607c72>] ? mutex_lock+0x12/0x2f
[  145.184444]  [<ffffffffa0163d68>] zfs_zinactive+0x168/0x180 [zfs]
[  145.184465]  [<ffffffffa015d4a7>] zfs_inactive+0x67/0x240 [zfs]
[  145.184469]  [<ffffffff81166979>] ? truncate_pagecache+0x59/0x60
[  145.184490]  [<ffffffffa0174a43>] zpl_evict_inode+0x43/0x60 [zfs]
[  145.184494]  [<ffffffff811e2157>] evict+0xa7/0x170
[  145.184497]  [<ffffffff811e2995>] iput+0xf5/0x180
[  145.184517]  [<ffffffffa01417f5>] zfs_unlinked_drain+0xb5/0xf0 [zfs]
[  145.184521]  [<ffffffff810980f6>] ? finish_wait+0x56/0x70
[  145.184526]  [<ffffffffa000e2a8>] ? taskq_create+0x228/0x370 [spl]
[  145.184529]  [<ffffffff81098240>] ? wake_up_bit+0x30/0x30
[  145.184550]  [<ffffffffa015ebf0>] ? zfs_get_done+0x70/0x70 [zfs]
[  145.184570]  [<ffffffffa0165e72>] ? zil_open+0x42/0x60 [zfs]
[  145.184591]  [<ffffffffa0155778>] zfs_sb_setup+0x138/0x170 [zfs]
[  145.184612]  [<ffffffffa0156523>] zfs_domount+0x2d3/0x360 [zfs]
[  145.184633]  [<ffffffffa0174b60>] ? zpl_kill_sb+0x20/0x20 [zfs]
[  145.184652]  [<ffffffffa0174b8c>] zpl_fill_super+0x2c/0x40 [zfs]
[  145.184656]  [<ffffffff811c9e9d>] mount_nodev+0x4d/0xb0
[  145.184675]  [<ffffffffa0175062>] zpl_mount+0x52/0x80 [zfs]
[  145.184679]  [<ffffffff811ca9e9>] mount_fs+0x39/0x1b0
[  145.184682]  [<ffffffff811e604f>] vfs_kern_mount+0x5f/0xf0
[  145.184684]  [<ffffffff811e859e>] do_mount+0x24e/0xa40
[  145.184688]  [<ffffffff8115b53e>] ? __get_free_pages+0xe/0x50
[  145.184690]  [<ffffffff811e8e26>] SyS_mount+0x96/0xf0
[  145.184693]  [<ffffffff81614409>] system_call_fastpath+0x16/0x1b
[  145.184694] Code: 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f b7 f2 b8 00 80 00 00 eb 0d 66 0f 1f 44 00 00 f3 90 <83> e8 01 74 0a 0f b7 0f 66 39 ca 75 f1 5d c3 0f 1f 80 00 00 00
[  145.184714] NMI backtrace for cpu 3
[  145.184719] CPU: 3 PID: 0 Comm: swapper/3 Tainted: PF          O--------------   3.10.0-229.20.1.el7.x86_64 #1
[  145.184720] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  145.184723] task: ffff8802341b6660 ti: ffff880234208000 task.ti: ffff880234208000
[  145.184724] RIP: 0010:[<ffffffff81052de6>]  [<ffffffff81052de6>] native_safe_halt+0x6/0x10
[  145.184728] RSP: 0018:ffff88023420be90  EFLAGS: 00000286
[  145.184730] RAX: 00000000ffffffed RBX: ffff88023420bfd8 RCX: 0100000000000000
[  145.184731] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
[  145.184732] RBP: ffff88023420be90 R08: 0000000000000000 R09: 0000000000000000
[  145.184732] R10: 0000000000000000 R11: ffff8800bbbc7f30 R12: 0000000000000003
[  145.184733] R13: ffffffff81a21700 R14: 0000000000000000 R15: ffff88023420bfd8
[  145.184736] FS:  0000000000000000(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
[  145.184737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.184738] CR2: 00007f39bf9a3000 CR3: 00000002327b2000 CR4: 00000000001407e0
[  145.184757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.184771] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  145.184772] Stack:
[  145.184773]  ffff88023420beb0 ffffffff8101c85f ffff88023420bfd8 0000000000000000
[  145.184775]  ffff88023420bec0 ffffffff8101d166 ffff88023420bf20 ffffffff810c6921
[  145.184777]  ffff88023420bfd8 ffff88023420bfd8 ffff88023420bfd8 e8892c2df8f49441
[  145.184779] Call Trace:
[  145.184782]  [<ffffffff8101c85f>] default_idle+0x1f/0xc0
[  145.184785]  [<ffffffff8101d166>] arch_cpu_idle+0x26/0x30
[  145.184787]  [<ffffffff810c6921>] cpu_startup_entry+0xf1/0x290
[  145.184791]  [<ffffffff8104228a>] start_secondary+0x1ba/0x230
[  145.184792] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
[  172.637158] ------------[ cut here ]------------
[  172.637172] WARNING: at net/sched/sch_generic.c:259 dev_watchdog+0x270/0x280()
[  172.637174] NETDEV WATCHDOG: eth0 (vmxnet3): transmit queue 0 timed out
[  172.637176] Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ppdev ghash_clmulni_intel cryptd vmw_balloon serio_raw pcspkr parport_pc shpchp parport i2c_piix4 vmw_vmci xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif ata_generic crct10dif_common pata_acpi vmwgfx drm_kms_helper ttm vmxnet3 drm ata_piix vmw_pvscsi libata i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate
[  172.637208] CPU: 3 PID: 0 Comm: swapper/3 Tainted: PF          O--------------   3.10.0-229.20.1.el7.x86_64 #1
[  172.637211] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  172.637213]  ffff88023fd83d88 e8892c2df8f49441 ffff88023fd83d40 ffffffff816045b6
[  172.637216]  ffff88023fd83d78 ffffffff8106e29b 0000000000000000 ffff880232800000
[  172.637219]  ffff88022f662440 0000000000000004 0000000000000003 ffff88023fd83de0
[  172.637222] Call Trace:
[  172.637224]  <IRQ>  [<ffffffff816045b6>] dump_stack+0x19/0x1b
[  172.637235]  [<ffffffff8106e29b>] warn_slowpath_common+0x6b/0xb0
[  172.637238]  [<ffffffff8106e33c>] warn_slowpath_fmt+0x5c/0x80
[  172.637242]  [<ffffffff81099de1>] ? run_posix_cpu_timers+0x51/0x840
[  172.637246]  [<ffffffff8151d830>] dev_watchdog+0x270/0x280
[  172.637249]  [<ffffffff8151d5c0>] ? dev_graft_qdisc+0x80/0x80
[  172.637254]  [<ffffffff8107df66>] call_timer_fn+0x36/0x110
[  172.637257]  [<ffffffff8151d5c0>] ? dev_graft_qdisc+0x80/0x80
[  172.637260]  [<ffffffff8107fddf>] run_timer_softirq+0x21f/0x320
[  172.637263]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  172.637266]  [<ffffffff81615d1c>] call_softirq+0x1c/0x30
[  172.637272]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  172.637275]  [<ffffffff81077ed5>] irq_exit+0x115/0x120
[  172.637278]  [<ffffffff81616995>] smp_apic_timer_interrupt+0x45/0x60
[  172.637282]  [<ffffffff8161505d>] apic_timer_interrupt+0x6d/0x80
[  172.637283]  <EOI>  [<ffffffff8109b838>] ? hrtimer_start+0x18/0x20
[  172.637290]  [<ffffffff81052de6>] ? native_safe_halt+0x6/0x10
[  172.637293]  [<ffffffff8101c85f>] default_idle+0x1f/0xc0
[  172.637296]  [<ffffffff8101d166>] arch_cpu_idle+0x26/0x30
[  172.637301]  [<ffffffff810c6921>] cpu_startup_entry+0xf1/0x290
[  172.637305]  [<ffffffff8104228a>] start_secondary+0x1ba/0x230
[  172.637308] ---[ end trace 93cb2e24d4d03d76 ]---
[  172.637312] vmxnet3 0000:0b:00.0 eth0: tx hang
[  172.644382] vmxnet3 0000:0b:00.0 eth0: resetting
[  172.651711] vmxnet3 0000:0b:00.0 eth0: intr type 3, mode 0, 5 vectors allocated
[  172.652817] vmxnet3 0000:0b:00.0 eth0: NIC Link is Up 10000 Mbps
[  257.579673] vmxnet3 0000:0b:00.0 eth0: tx hang
[  257.580025] vmxnet3 0000:0b:00.0 eth0: resetting
[  257.587729] vmxnet3 0000:0b:00.0 eth0: intr type 3, mode 0, 5 vectors allocated
[  257.588981] vmxnet3 0000:0b:00.0 eth0: NIC Link is Up 10000 Mbps

The text was updated successfully, but these errors were encountered:

mihaivint · 2015-11-11T19:21:26Z

I had to boot without zfs and performed a zpool import, and this is how i've obtained the output:
``
zpool import
pool: data
id: 14971036982025492324
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

    data        ONLINE
      sdb       ONLINE

zpool import -f data

mihaivint · 2015-11-11T19:30:27Z

And here is one without selinux enabled so that is not the fault

  214.227459] NMI backtrace for cpu 3
[  214.227463] CPU: 3 PID: 4432 Comm: mount.zfs Tainted: PF          O--------------   3.10.0-229.20.1.el7.x86_64 #1
[  214.227465] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[  214.227466] task: ffff8802322f6660 ti: ffff88021ace0000 task.ti: ffff88021ace0000
[  214.227468] RIP: 0010:[<ffffffff8160b8ca>]  [<ffffffff8160b8ca>] _raw_spin_lock_irq+0x3a/0x60
[  214.227474] RSP: 0018:ffff88021ace35a8  EFLAGS: 00000097
[  214.227475] RAX: 0000000000001087 RBX: ffff8802322f6660 RCX: 00000000000004dd
[  214.227476] RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff88023005d8b8
[  214.227477] RBP: ffff88021ace35a8 R08: 0000000000000202 R09: 0000000000000000
[  214.227478] R10: ffffea0008cb8000 R11: ffffffffa000a75a R12: ffff88023005d8b0
[  214.227479] R13: ffffffffffffffff R14: ffff88023005d8b8 R15: ffff88023005d800
[  214.227481] FS:  00007f9692f5e780(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
[  214.227483] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  214.227484] CR2: 00007f969225bc10 CR3: 00000002326d3000 CR4: 00000000001407e0
[  214.227502] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  214.227516] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  214.227517] Stack:
[  214.227518]  ffff88021ace3618 ffffffff8160b313 ffff8802327c6c48 ffffffffa0181b40
[  214.227521]  ffff8802196c6000 ffff8802322f6660 ffff880200000001 ffff8802327c6c48
[  214.227523]  0000000005000266 ffff88021ace36a8 ffff88023005d8b0 0000000000000002
[  214.227525] Call Trace:
[  214.227529]  [<ffffffff8160b313>] rwsem_down_read_failed+0x53/0x165
[  214.227558]  [<ffffffff812e2e54>] call_rwsem_down_read_failed+0x14/0x30
[  214.227565]  [<ffffffffa000a75a>] ? spl_kmem_free+0x2a/0x40 [spl]
[  214.227569]  [<ffffffff81608c90>] ? down_read+0x20/0x30
[  214.227602]  [<ffffffffa013234c>] zap_get_leaf_byblk+0xec/0x2c0 [zfs]
[  214.227606]  [<ffffffff81607c72>] ? mutex_lock+0x12/0x2f
[  214.227633]  [<ffffffffa013259a>] zap_deref_leaf+0x7a/0xa0 [zfs]
[  214.227660]  [<ffffffffa01342fd>] fzap_cursor_retrieve+0x13d/0x2d0 [zfs]
[  214.227688]  [<ffffffffa0137ca4>] zap_cursor_retrieve+0x64/0x320 [zfs]
[  214.227714]  [<ffffffffa0142323>] zfs_purgedir+0x93/0x220 [zfs]
[  214.227740]  [<ffffffffa010a7fc>] ? sa_lookup+0x9c/0xc0 [zfs]
[  214.227766]  [<ffffffffa01608f4>] ? zfs_inode_update+0x1a4/0x1b0 [zfs]
[  214.227770]  [<ffffffff81098235>] ? wake_up_bit+0x25/0x30
[  214.227773]  [<ffffffff811e1290>] ? unlock_new_inode+0x50/0x70
[  214.227798]  [<ffffffffa0160d37>] ? zfs_znode_alloc+0x437/0x550 [zfs]
[  214.227825]  [<ffffffffa0142722>] zfs_rmnode+0x272/0x350 [zfs]
[  214.227828]  [<ffffffff81607c72>] ? mutex_lock+0x12/0x2f
[  214.227853]  [<ffffffffa0163d68>] zfs_zinactive+0x168/0x180 [zfs]
[  214.227879]  [<ffffffffa015d4a7>] zfs_inactive+0x67/0x240 [zfs]
[  214.227882]  [<ffffffff81166979>] ? truncate_pagecache+0x59/0x60
[  214.227907]  [<ffffffffa0174a43>] zpl_evict_inode+0x43/0x60 [zfs]
[  214.227912]  [<ffffffff811e2157>] evict+0xa7/0x170
[  214.227914]  [<ffffffff811e2995>] iput+0xf5/0x180
[  214.227940]  [<ffffffffa01417f5>] zfs_unlinked_drain+0xb5/0xf0 [zfs]
[  214.227943]  [<ffffffff810980f6>] ? finish_wait+0x56/0x70
[  214.227949]  [<ffffffffa000e2a8>] ? taskq_create+0x228/0x370 [spl]
[  214.227952]  [<ffffffff81098240>] ? wake_up_bit+0x30/0x30
[  214.227977]  [<ffffffffa015ebf0>] ? zfs_get_done+0x70/0x70 [zfs]
[  214.228002]  [<ffffffffa0165e72>] ? zil_open+0x42/0x60 [zfs]
[  214.228028]  [<ffffffffa0155778>] zfs_sb_setup+0x138/0x170 [zfs]
[  214.228053]  [<ffffffffa0156523>] zfs_domount+0x2d3/0x360 [zfs]
[  214.228078]  [<ffffffffa0174b60>] ? zpl_kill_sb+0x20/0x20 [zfs]
[  214.228104]  [<ffffffffa0174b8c>] zpl_fill_super+0x2c/0x40 [zfs]
[  214.228107]  [<ffffffff811c9e9d>] mount_nodev+0x4d/0xb0
[  214.228132]  [<ffffffffa0175062>] zpl_mount+0x52/0x80 [zfs]
[  214.228136]  [<ffffffff811ca9e9>] mount_fs+0x39/0x1b0
[  214.228139]  [<ffffffff811e604f>] vfs_kern_mount+0x5f/0xf0
[  214.228141]  [<ffffffff811e859e>] do_mount+0x24e/0xa40
[  214.228145]  [<ffffffff8115b53e>] ? __get_free_pages+0xe/0x50
[  214.228148]  [<ffffffff811e8e26>] SyS_mount+0x96/0xf0
[  214.228151]  [<ffffffff81614409>] system_call_fastpath+0x16/0x1b

tuxoko · 2015-11-13T20:33:49Z

@mihaivint
What ZFS version was this pool created?

We seems to have similar issues in the past, but I'm not quite sure if it's fixed or not.

mihaivint · 2015-11-13T20:45:05Z

I'm not exactly sure if it was directly on this 0.6.5.1 or it was on 0.6.4.1 and then upgraded

neingeist · 2015-11-14T12:42:31Z

Same here with my pool: #3814
My pool was created with 0.6.4 or earlier.

mihaivint · 2015-11-14T13:09:01Z

At least there is a way to recover the data, in this case there wasn't a need as i had a replica of the data, but good to know the ro mount

behlendorf · 2015-12-23T19:32:03Z

@mihaivint can you apply the patches in #4123, they are safe and should resolve the problem. Definitely let us know if that doesn't fix things.

mihaivint · 2015-12-23T20:36:48Z

Unfortunatelly i don't have that drive to test this, so i can't provide additional info.

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4114 Issue openzfs#4052 Issue openzfs#4006 Issue openzfs#3018 Issue openzfs#2861

During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4114 Closes #4052 Closes #4006 Closes #3018 Closes #2861

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4114 Issue #4052 Issue #4006 Issue #3018 Issue #2861

During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#4114 Closes openzfs#4052 Closes openzfs#4006 Closes openzfs#3018 Closes openzfs#2861

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4114 Issue openzfs#4052 Issue openzfs#4006 Issue openzfs#3018 Issue openzfs#2861

neingeist · 2016-01-06T05:29:35Z

The fixes also worked for me.

During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#4114 Closes openzfs#4052 Closes openzfs#4006 Closes openzfs#3018 Closes openzfs#2861

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4114 Issue openzfs#4052 Issue openzfs#4006 Issue openzfs#3018 Issue openzfs#2861

During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#4114 Closes openzfs#4052 Closes openzfs#4006 Closes openzfs#3018 Closes openzfs#2861

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4114 Issue openzfs#4052 Issue openzfs#4006 Issue openzfs#3018 Issue openzfs#2861

During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#4114 Closes openzfs#4052 Closes openzfs#4006 Closes openzfs#3018 Closes openzfs#2861

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4114 Issue openzfs#4052 Issue openzfs#4006 Issue openzfs#3018 Issue openzfs#2861

tuxoko mentioned this issue Nov 13, 2015

Process 'zfs_unlinked_drain' asynchronously on mount #3814

Closed

behlendorf added the Bug - Minor label Nov 19, 2015

tuxoko mentioned this issue Dec 3, 2015

ZFS breaks on filesystem mount #4052

Closed

tuxoko mentioned this issue Dec 17, 2015

ZFS 0.6.5.3 - zpool import after hard reboot causes kernel panic #4114

Closed

behlendorf closed this as completed in 29572cc Dec 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nasty deadlock on zpool import #4006

Nasty deadlock on zpool import #4006

mihaivint commented Nov 11, 2015

mihaivint commented Nov 11, 2015

mihaivint commented Nov 11, 2015

tuxoko commented Nov 13, 2015

mihaivint commented Nov 13, 2015

neingeist commented Nov 14, 2015

mihaivint commented Nov 14, 2015

behlendorf commented Dec 23, 2015

mihaivint commented Dec 23, 2015

neingeist commented Jan 6, 2016

Nasty deadlock on zpool import #4006

Nasty deadlock on zpool import #4006

Comments

mihaivint commented Nov 11, 2015

mihaivint commented Nov 11, 2015

mihaivint commented Nov 11, 2015

tuxoko commented Nov 13, 2015

mihaivint commented Nov 13, 2015

neingeist commented Nov 14, 2015

mihaivint commented Nov 14, 2015

behlendorf commented Dec 23, 2015

mihaivint commented Dec 23, 2015

neingeist commented Jan 6, 2016