zfs_iput_taskq spinning again #2128

DeHackEd · 2014-02-14T18:07:47Z

Hardware

CPU: Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz
RAM: 4 GB
OS: CentOS 6
Kernel: Linux H264 2.6.32-431.3.1.el6.x86_64 # 1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
SPL: spl-0.6.2-23-g4c99541
ZFS: zfs-0.6.2-171-g2e7b765

ZFS stats

# zpool status
  pool: netlab
 state: ONLINE
  scan: none requested
config:

    NAME           STATE     READ WRITE CKSUM
    netlab         ONLINE       0     0     0
      VolGroup00-zfs  ONLINE       0     0     0

errors: No known data errors
# zfs list -o name,size,used,mounted
NAME                 REFER   USED  MOUNTED
netlab               32.5K   974M      yes
netlab/lxc             30K   259M       no
netlab/lxc/clone1     692M  4.43M      yes
netlab/lxc/clone10    695M  7.39M      yes
netlab/lxc/clone100   688M    34K      yes
netlab/lxc/clone101   688M   809K      yes
netlab/lxc/clone102   688M   809K      yes
netlab/lxc/clone103   688M   804K      yes
netlab/lxc/clone104   688M   804K      yes
netlab/lxc/clone105   688M   802K      yes
...
netlab/lxc/clone2     695M  8.15M      yes
....
netlab/lxc/clone3     696M  8.47M      yes
netlab/lxc/clone30    688M  1.02M      yes
netlab/lxc/clone300   688M    34K      yes
netlab/lxc/clone31    688M   830K      yes
netlab/lxc/clone32    688M   830K      yes
netlab/lxc/clone33    688M   830K      yes
netlab/lxc/clone34    688M   829K      yes
netlab/lxc/clone35    688M   832K      yes
netlab/lxc/clone36    688M   834K      yes
netlab/lxc/clone37    688M   830K      yes
netlab/lxc/clone38    688M   833K      yes
netlab/lxc/clone39    688M   833K      yes
netlab/lxc/clone4     694M  6.30M      yes
netlab/lxc/clone40    688M   834K      yes
netlab/lxc/clone41    688M   833K      yes
netlab/lxc/clone42    688M   832K      yes
netlab/lxc/clone43    688M   833K      yes
netlab/lxc/clone44    688M   832K      yes
netlab/lxc/clone45    688M   833K      yes
netlab/lxc/clone46    688M   833K      yes
netlab/lxc/clone47    688M   834K      yes
netlab/lxc/clone48    688M   834K      yes
netlab/lxc/clone49    688M   834K      yes
netlab/lxc/clone5     695M  7.21M      yes
netlab/lxc/clone50    688M   831K      yes
netlab/lxc/clone51    688M   804K      yes
netlab/lxc/clone52    688M   806K      yes
netlab/lxc/clone53    688M   805K      yes
netlab/lxc/clone54    688M   803K      yes
netlab/lxc/clone55    688M   804K      yes
netlab/lxc/clone56    688M   814K      yes
netlab/lxc/clone57    688M   818K      yes
netlab/lxc/clone58    688M   817K      yes
netlab/lxc/clone59    688M   806K      yes
netlab/lxc/clone6     695M  7.55M      yes
netlab/lxc/clone60    688M   816K      yes
netlab/lxc/clone61    688M   818K      yes
netlab/lxc/clone62    688M   817K      yes
netlab/lxc/clone63    688M   814K      yes
netlab/lxc/clone64    688M   815K      yes
netlab/lxc/clone65    688M   817K      yes
netlab/lxc/clone66    688M   818K      yes
netlab/lxc/clone67    688M   802K      yes
netlab/lxc/clone68    688M   818K      yes
netlab/lxc/clone69    688M   816K      yes
netlab/lxc/clone7     695M  7.74M      yes
netlab/lxc/clone70    688M   813K      yes
netlab/lxc/clone71    688M   816K      yes
netlab/lxc/clone72    688M   813K      yes
netlab/lxc/clone73    688M   813K      yes
netlab/lxc/clone74    688M   817K      yes
netlab/lxc/clone75    688M   814K      yes
netlab/lxc/clone76    688M   813K      yes
netlab/lxc/clone77    688M   816K      yes
netlab/lxc/clone78    688M   816K      yes
netlab/lxc/clone79    688M   819K      yes
netlab/lxc/clone8     754M  66.4M      yes
netlab/lxc/clone80    688M   813K      yes
netlab/lxc/clone81    688M   813K      yes
netlab/lxc/clone82    688M   814K      yes
netlab/lxc/clone83    688M   816K      yes
netlab/lxc/clone84    688M   816K      yes
netlab/lxc/clone85    688M   814K      yes
netlab/lxc/clone86    688M   814K      yes
netlab/lxc/clone87    688M   814K      yes
netlab/lxc/clone88    688M   816K      yes
netlab/lxc/clone89    688M   816K      yes
netlab/lxc/clone9     697M  9.99M      yes
netlab/lxc/clone90    688M   814K      yes
netlab/lxc/clone91    688M   820K      yes
netlab/lxc/clone92    688M   813K      yes
netlab/lxc/clone93    688M   813K      yes
netlab/lxc/clone94    688M   816K      yes
netlab/lxc/clone95    688M   814K      yes
netlab/lxc/clone96    688M   820K      yes
netlab/lxc/clone97    688M   813K      yes
netlab/lxc/clone98    688M   814K      yes
netlab/lxc/clone99    688M   813K      yes
netlab/template       688M   688M      yes

LVM partition used for ZFS, total 294 filesystems (obviously I removed a lot for the sake of brevity), all mounted. Only ~10 of them are in use for LXC, the rest are sitting idle. Each container runs a very minimal CentOS install - sshd, rsyslog, 6 ttys, and a bunch of network sessions up.

Task info

top - 12:51:54 up 9 days, 23:25,  2 users,  load average: 1.02, 1.13, 1.54
Tasks: 1947 total,   2 running, 1945 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.8%us, 34.6%sy,  0.0%ni, 61.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3833148k total,  2986820k used,   846328k free,    39148k buffers
Swap:  2097144k total,    51868k used,  2045276k free,   506324k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                
11845 root       0 -20     0    0    0 R 94.0  0.0   1308:01 zfs_iput_taskq/        
 4927 root      20   0 16520 2712  948 R 62.6  0.1   0:00.19 top                    
    1 root      20   0 19364  528  316 S  0.0  0.0   0:01.36 init

Stack traces

NMI backtrace for cpu 0
CPU 0 
Modules linked in: bridge stp llc ppp_async crc_ccitt ppp_generic slhc macvlan zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate autofs4 cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6
 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_snapshot ppdev iTCO_wdt iTCO_vendor_support parport_pc parport microcode rfkill 8139too 8139cp mii serio_raw sg i2c_i801 lpc_ich mfd_core e1000e ptp pps_core snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_se
q snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahci xhci_hcd wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zfs]

Pid: 11845, comm: zfs_iput_taskq/ Tainted: P           ---------------    2.6.32-431.3.1.el6.x86_64 #1 Acer Veriton M6620G/Veriton M6620G
RIP: 0010:[<ffffffff8152a14e>]  [<ffffffff8152a14e>] _spin_trylock+0x1e/0x30
RSP: 0018:ffff8801156658f0  EFLAGS: 00000046
RAX: 0000000094109410 RBX: ffff88007522a9a8 RCX: 00000000000006aa
RDX: 0000000094119410 RSI: 0000000000000770 RDI: ffff88007522a9b0
RBP: ffff8801156658f0 R08: 000000000000000e R09: ffff880115665920
R10: ffff880115665a78 R11: 0000000032366664 R12: ffff88007522a9e0
R13: ffff88007522a9b0 R14: 0000000000000001 R15: 0000000000000206
FS:  0000000000000000(0000) GS:ffff88002c200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fd99afed980 CR3: 000000007cdab000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process zfs_iput_taskq/ (pid: 11845, threadinfo ffff880115664000, task ffff88011567f540)
Stack:
 ffff8801156659c0 ffffffffa0657139 ffff8800a22eaf40 0000000000000001
<d> ffff8800adc04bb0 ffff880000000000 00000000000006fb 0000000000000001
<d> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffffa0657139>] dmu_zfetch+0x329/0xe40 [zfs]
 [<ffffffffa063ff91>] dbuf_read+0x6a1/0x740 [zfs]
 [<ffffffffa065ac09>] dnode_hold_impl+0x129/0x560 [zfs]
 [<ffffffffa063fb27>] ? dbuf_read+0x237/0x740 [zfs]
 [<ffffffffa065b059>] dnode_hold+0x19/0x20 [zfs]
 [<ffffffffa0647e74>] dmu_bonus_hold+0x34/0x290 [zfs]
 [<ffffffff811a4e36>] ? __iget+0x66/0x70
 [<ffffffff811a672e>] ? ifind_fast+0x5e/0xb0
 [<ffffffffa067960e>] sa_buf_hold+0xe/0x10 [zfs]
 [<ffffffffa06d07ba>] zfs_zget+0xca/0x1d0 [zfs]
 [<ffffffff81528f1e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0658294>] ? dnode_rele+0x54/0x90 [zfs]
 [<ffffffffa06aff74>] zfs_unlinked_drain+0xa4/0x130 [zfs]
 [<ffffffffa0443d0b>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffffa0448568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527920>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0448350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109af06>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae70>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20


...
SysRq : Show backtrace of all active CPUs
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU 3 
Pid: 11845, comm: zfs_iput_taskq/ Tainted: P           ---------------    2.6.32-431.3.1.el6.x86_64 #1 Acer Veriton M6620G/Veriton M6620G
RIP: 0010:[<ffffffffa02660f0>]  [<ffffffffa02660f0>] avl_nearest+0x0/0x40 [zavl]
RSP: 0018:ffff880115665858  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88007522a660 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007522a848
RBP: ffff8801156658d0 R08: 0000000000000000 R09: 0000000000000002
R10: ffff880115665930 R11: 0000003963633031 R12: 0000000000000002
R13: 000000000000075f R14: 0000000000000002 R15: ffff88007522a848
FS:  0000000000000000(0000) GS:ffff88002c380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f80ca5c3ba0 CR3: 000000005e946000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process zfs_iput_taskq/ (pid: 11845, threadinfo ffff880115664000, task ffff88011567f540)
Stack:
 ffffffffa0658386 ffff880115665880 ffff88007522a758 0000000000000000
<d> ffff88007cc2b000 ffff8801156658d0 000000000000075f ffff880085da56c0
<d> 0000000000000000 ffff8800b21656c0 ffff88007522a660 000000000000075f
Call Trace:
 [<ffffffffa0658386>] ? dnode_block_freed+0xb6/0x160 [zfs]
 [<ffffffffa064180a>] dbuf_prefetch+0x3a/0x270 [zfs]
 [<ffffffffa0657166>] ? dmu_zfetch+0x356/0xe40 [zfs]
 [<ffffffffa0656a1e>] dmu_zfetch_dofetch+0xfe/0x170 [zfs]
 [<ffffffffa0657628>] dmu_zfetch+0x818/0xe40 [zfs]
 [<ffffffffa063ff91>] dbuf_read+0x6a1/0x740 [zfs]
 [<ffffffffa065ac09>] dnode_hold_impl+0x129/0x560 [zfs]
 [<ffffffffa06a79ce>] ? zap_cursor_retrieve+0x14e/0x2f0 [zfs]
 [<ffffffff81528f1e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa065b059>] dnode_hold+0x19/0x20 [zfs]
 [<ffffffffa0647c98>] dmu_object_info+0x28/0x60 [zfs]
 [<ffffffffa06aff5e>] zfs_unlinked_drain+0x8e/0x130 [zfs]
 [<ffffffff81060b13>] ? perf_event_task_sched_out+0x33/0x70
 [<ffffffffa0443d0b>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffffa0448568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527920>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0448350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109af06>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae70>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Code: eb 1f 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b 50 08 48 85 d2 75 f4 48 85 c0 74 05 48 29 c8 c9 c3 31 c0 c9 c3 66 0f 1f 44 00 00 <55> 48 89 e5 0f 1f 44 00 00 31 c0 48 8b 4f 10 49 89 f0 49 83 e0 



..
Pid: 11845, comm: zfs_iput_taskq/ Tainted: P           ---------------    2.6.32-431.3.1.el6.x86_64 #1 Acer Veriton M6620G/Veriton M6620G
RIP: 0010:[<ffffffffa063d894>]  [<ffffffffa063d894>] dbuf_hash+0x74/0xd0 [zfs]
RSP: 0018:ffff880115665880  EFLAGS: 00000206
RAX: 04ec3e83e5dfdb74 RBX: 0000000000000000 RCX: 0000000000000f87
RDX: 0004ec3e83e5dfdb RSI: 0000000000000000 RDI: 0003fffe2001f30a
RBP: ffff880115665880 R08: 0000000000000074 R09: ffff880115665a00
R10: ffff880115665b58 R11: 0000003564373931 R12: ffff88007cc2b000
R13: 0000000000000000 R14: 0000000000000f87 R15: 0000000000000f77
FS:  0000000000000000(0000) GS:ffff88002c200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000040dd90 CR3: 0000000071fac000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process zfs_iput_taskq/ (pid: 11845, threadinfo ffff880115664000, task ffff88011567f540)
Stack:
 ffff8801156658d0 ffffffffa063db16 ffff8800b9ccba80 0000000000000000
<d> ffff8800b21656c0 ffff88007522a660 0000000000000f87 0000000000000002
<d> 0000000000000011 0000000000000f77 ffff880115665970 ffffffffa0641835
Call Trace:
 [<ffffffffa063db16>] dbuf_find+0x36/0x100 [zfs]
 [<ffffffffa0641835>] dbuf_prefetch+0x65/0x270 [zfs]
 [<ffffffffa0657166>] ? dmu_zfetch+0x356/0xe40 [zfs]
 [<ffffffffa0656a1e>] dmu_zfetch_dofetch+0xfe/0x170 [zfs]
 [<ffffffffa0657628>] dmu_zfetch+0x818/0xe40 [zfs]
 [<ffffffffa063ff91>] dbuf_read+0x6a1/0x740 [zfs]
 [<ffffffffa065ac09>] dnode_hold_impl+0x129/0x560 [zfs]
 [<ffffffffa06a79ce>] ? zap_cursor_retrieve+0x14e/0x2f0 [zfs]
 [<ffffffff81528f1e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa065b059>] dnode_hold+0x19/0x20 [zfs]
 [<ffffffffa0647c98>] dmu_object_info+0x28/0x60 [zfs]
 [<ffffffffa06aff5e>] zfs_unlinked_drain+0x8e/0x130 [zfs]
 [<ffffffff81060b13>] ? perf_event_task_sched_out+0x33/0x70
 [<ffffffffa0443d0b>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffffa0448568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527920>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0448350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109af06>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae70>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Code: e8 08 48 89 d0 48 31 f2 81 e2 ff 00 00 00 48 c1 e8 08 48 33 04 d5 e0 40 71 a0 49 31 c0 48 89 c2 41 81 e0 ff 00 00 00 48 c1 ea 08 <4a> 33 14 c5 e0 40 71 a0 48 89 d0 48 31 ca 81 e2 ff 00 00 00 48

Interesting supplemental information

# arcstat.py 2
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
12:22:15  1.1K     0      0     0    0     0    0     0    0   309M  309M  
12:22:17  276K     0      0     0    0     0    0     0    0   309M  309M  
12:22:19  276K     0      0     0    0     0    0     0    0   309M  309M  
12:22:21  277K     0      0     0    0     0    0     0    0   309M  309M  
12:22:23  275K     0      0     0    0     0    0     0    0   309M  309M  
12:22:25  277K     0      0     0    0     0    0     0    0   309M  309M  
12:22:27  276K     0      0     0    0     0    0     0    0   309M  309M  
12:22:29  277K     0      0     0    0     0    0     0    0   309M  309M
...
12:50:44  277K     0      0     0    0     0    0     0    0   309M  309M  
12:50:46  281K     0      0     0    0     0    0     0    0   309M  309M  
12:50:48  280K     0      0     0    0     0    0     0    0   309M  309M  
12:50:50  281K     0      0     0    0     0    0     0    0   309M  309M  
12:50:52  282K     0      0     0    0     0    0     0    0   309M  309M  
12:50:54  281K     0      0     0    0     0    0     0    0   309M  309M  
12:50:56  281K     0      0     0    0     0    0     0    0   309M  309M

# ps aux|wc -l
1949

... which is a lot, but IO on the system is pretty much settled.

While I was collecting the above information the thread finally went back to sleep. My ability to access the pool did not appear to be negatively impacted, but this system only uses its disk casually.

behlendorf · 2014-02-14T23:09:20Z

Interesting. Thanks for filing this detailed debugging for us!

BerndAmend · 2014-02-18T00:50:00Z

Hardware

CPU: AMD E-350 Processor
RAM: 4GB
OS: Archlinux
Linux: 3.12.9-2-ARCH #1 SMP PREEMPT x86_64
SPL: 0.6.2_3.12.9-2
ZFS: 0.6.2_3.12.9-2

I think my issue is related or the same as this issue.
I occasionally experience a very bad read and write performance (<1MB/s).
I started zpool scrub on the pool (it started at 100 KB/s and after 2 hours it was stable at 8 MB/s). The scrub run around 20 hours earlier with ~233 MB/s.

The pool itself is only ~3 days old. Snapshots are enabled using zfs-auto-snapshot
The only noticeable deviation from before is that the kernel process zfs_put_taskq is very busy with something.

top - 00:51:26 up  1:48,  2 users,  load average: 4,13, 5,13, 5,76
Tasks: 260 total,   2 running, 258 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,2 us, 50,2 sy,  0,0 ni, 49,6 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:   3644808 total,  1528160 used,  2116648 free,    33700 buffers
KiB Swap:        0 total,        0 used,        0 free.   386536 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                 
  437 root       0 -20       0      0      0 R  99,5  0,0 106:47.83 zfs_iput_taskq/

zpool iostat

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
data        3,38T  7,50T     66      8  5,98M  26,3K

And I saw once the following kernel log message.

[  481.506065] INFO: task zpool:1184 blocked for more than 120 seconds.
[  481.506223]       Tainted: P           O 3.12.9-2-ARCH #1
[  481.506335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  481.506492] zpool           D 0000000000000002     0  1184   1183 0x00000000
[  481.506505]  ffff880076863cb0 0000000000000086 00000000000144c0 ffff880076863fd8
[  481.506518]  ffff880076863fd8 00000000000144c0 ffff880089b02b70 ffff88009b7bce3e
[  481.506528]  ffff880076863c50 ffffffff8129b0c4 ffffffff810b0010 ffff88007686ffff
[  481.506539] Call Trace:
[  481.506561]  [<ffffffff8129b0c4>] ? vsnprintf+0x214/0x680
[  481.506573]  [<ffffffff810b0010>] ? save_image_lzo+0x40/0x950
[  481.506624]  [<ffffffffa032b7f6>] ? trace_put_tcd+0x16/0x50 [spl]
[  481.506642]  [<ffffffffa032bfd1>] ? spl_debug_msg+0x411/0x840 [spl]
[  481.506653]  [<ffffffff814f26a9>] schedule+0x29/0x70
[  481.506673]  [<ffffffffa033b43d>] cv_wait_common+0x10d/0x1c0 [spl]
[  481.506683]  [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
[  481.506702]  [<ffffffffa033b505>] __cv_wait+0x15/0x20 [spl]
[  481.506765]  [<ffffffffa04a3f0b>] txg_wait_synced+0xcb/0x1b0 [zfs]
[  481.506824]  [<ffffffffa04848ea>] dsl_sync_task_group_wait+0x16a/0x280 [zfs]
[  481.506880]  [<ffffffffa0480690>] ? dsl_prop_unset_hasrecvd+0x20/0x20 [zfs]
[  481.506934]  [<ffffffffa04812a0>] ? dsl_scan_cancel_sync+0x30/0x30 [zfs]
[  481.506989]  [<ffffffffa0484bce>] dsl_sync_task_do+0x4e/0x70 [zfs]
[  481.507044]  [<ffffffffa048465e>] dsl_scan+0x6e/0x80 [zfs]
[  481.507101]  [<ffffffffa0494727>] spa_scan+0x37/0x70 [zfs]
[  481.507151]  [<ffffffffa04c8dfa>] zfs_ioc_pool_scan+0x3a/0x70 [zfs]
[  481.507201]  [<ffffffffa04ca19a>] zfsdev_ioctl+0xfa/0x1a0 [zfs]
[  481.507213]  [<ffffffff811b7375>] do_vfs_ioctl+0x2e5/0x4d0
[  481.507222]  [<ffffffff81164c8b>] ? remove_vma+0x5b/0x70
[  481.507231]  [<ffffffff81166ff8>] ? do_munmap+0x298/0x380
[  481.507241]  [<ffffffff811b75e1>] SyS_ioctl+0x81/0xa0
[  481.507251]  [<ffffffff814f7d1e>] ? do_page_fault+0xe/0x10
[  481.507261]  [<ffffffff814fbbed>] system_call_fastpath+0x1a/0x1f

Since the scrub was so slow I wanted to stop the scrub and export/import the pool.
Canceling the scrub took a while but was eventually succeful.
After a while the zpool export was successful and everything was normal after I imported the pool again (read/write: ~160MB/s)
During the export I saw the following kernel log messages:

[ 5521.295839] INFO: task zfs:3610 blocked for more than 120 seconds.
[ 5521.295993]       Tainted: P           O 3.12.9-2-ARCH #1
[ 5521.296105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5521.296262] zfs             D 0000000000000002     0  3610   3558 0x00000000
[ 5521.296275]  ffff88005e6bfd40 0000000000000086 00000000000144c0 ffff88005e6bffd8
[ 5521.296288]  ffff88005e6bffd8 00000000000144c0 ffff8800a344bcd0 ffff880092f939f0
[ 5521.296298]  0000000000000246 ffff880092f93800 0000000000000246 ffff88009db2e040
[ 5521.296309] Call Trace:
[ 5521.296333]  [<ffffffff814f0e1e>] ? mutex_unlock+0xe/0x10
[ 5521.296412]  [<ffffffffa044f3c4>] ? dbuf_rele_and_unlock+0x184/0x260 [zfs]
[ 5521.296458]  [<ffffffffa044f5fd>] ? dmu_buf_rele+0x3d/0x50 [zfs]
[ 5521.296514]  [<ffffffffa047869b>] ? dsl_dir_close+0x2b/0x30 [zfs]
[ 5521.296525]  [<ffffffff814f26a9>] schedule+0x29/0x70
[ 5521.296550]  [<ffffffffa033b43d>] cv_wait_common+0x10d/0x1c0 [spl]
[ 5521.296591]  [<ffffffffa044f5fd>] ? dmu_buf_rele+0x3d/0x50 [zfs]
[ 5521.296603]  [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
[ 5521.296625]  [<ffffffffa033b505>] __cv_wait+0x15/0x20 [spl]
[ 5521.296681]  [<ffffffffa048b8bb>] rrw_enter+0xcb/0x1f0 [zfs]
[ 5521.296731]  [<ffffffffa04c6045>] zfs_sb_hold+0x45/0xa0 [zfs]
[ 5521.296780]  [<ffffffffa04c98d8>] zfs_unmount_snap+0x68/0x120 [zfs]
[ 5521.296830]  [<ffffffffa04c9a1f>] zfs_ioc_destroy_snaps_nvl+0x8f/0x130 [zfs]
[ 5521.296880]  [<ffffffffa04ca19a>] zfsdev_ioctl+0xfa/0x1a0 [zfs]
[ 5521.296891]  [<ffffffff811b7375>] do_vfs_ioctl+0x2e5/0x4d0
[ 5521.296901]  [<ffffffff81164c8b>] ? remove_vma+0x5b/0x70
[ 5521.296910]  [<ffffffff81166ff8>] ? do_munmap+0x298/0x380
[ 5521.296919]  [<ffffffff811b75e1>] SyS_ioctl+0x81/0xa0
[ 5521.296930]  [<ffffffff814f7d1e>] ? do_page_fault+0xe/0x10
[ 5521.296940]  [<ffffffff814fbbed>] system_call_fastpath+0x1a/0x1f
[ 5641.291366] INFO: task zfs:3610 blocked for more than 120 seconds.
[ 5641.291520]       Tainted: P           O 3.12.9-2-ARCH #1
[ 5641.291632] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5641.291789] zfs             D 0000000000000002     0  3610   3558 0x00000000
[ 5641.291803]  ffff88005e6bfd40 0000000000000086 00000000000144c0 ffff88005e6bffd8
[ 5641.291815]  ffff88005e6bffd8 00000000000144c0 ffff8800a344bcd0 ffff880092f939f0
[ 5641.291826]  0000000000000246 ffff880092f93800 0000000000000246 ffff88009db2e040
[ 5641.291836] Call Trace:
[ 5641.291860]  [<ffffffff814f0e1e>] ? mutex_unlock+0xe/0x10
[ 5641.291940]  [<ffffffffa044f3c4>] ? dbuf_rele_and_unlock+0x184/0x260 [zfs]
...
[ 5761.286763] INFO: task zfs:3610 blocked for more than 120 seconds.
[ 5761.286847]       Tainted: P           O 3.12.9-2-ARCH #1
[ 5761.286904] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5761.286983] zfs             D 0000000000000002     0  3610   3558 0x00000000
[ 5761.286992]  ffff88005e6bfd40 0000000000000086 00000000000144c0 ffff88005e6bffd8
[ 5761.286999]  ffff88005e6bffd8 00000000000144c0 ffff8800a344bcd0 ffff880092f939f0
[ 5761.287005]  0000000000000246 ffff880092f93800 0000000000000246 ffff88009db2e040
[ 5761.287011] Call Trace:
[ 5761.287027]  [<ffffffff814f0e1e>] ? mutex_unlock+0xe/0x10
[ 5761.287082]  [<ffffffffa044f3c4>] ? dbuf_rele_and_unlock+0x184/0x260 [zfs]
...
[ 5881.282296] INFO: task zfs:3610 blocked for more than 120 seconds.
[ 5881.282452]       Tainted: P           O 3.12.9-2-ARCH #1
[ 5881.282563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5881.282720] zfs             D 0000000000000002     0  3610   3558 0x00000000
[ 5881.282734]  ffff88005e6bfd40 0000000000000086 00000000000144c0 ffff88005e6bffd8
[ 5881.282746]  ffff88005e6bffd8 00000000000144c0 ffff8800a344bcd0 ffff880092f939f0
[ 5881.282756]  0000000000000246 ffff880092f93800 0000000000000246 ffff88009db2e040
[ 5881.282767] Call Trace:
[ 5881.282790]  [<ffffffff814f0e1e>] ? mutex_unlock+0xe/0x10
[ 5881.282870]  [<ffffffffa044f3c4>] ? dbuf_rele_and_unlock+0x184/0x260 [zfs]
...
[ 6001.277715] INFO: task zfs:3610 blocked for more than 120 seconds.
[ 6001.277871]       Tainted: P           O 3.12.9-2-ARCH #1
[ 6001.277982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6001.278139] zfs             D 0000000000000002     0  3610   3558 0x00000000
[ 6001.278152]  ffff88005e6bfd40 0000000000000086 00000000000144c0 ffff88005e6bffd8
[ 6001.278165]  ffff88005e6bffd8 00000000000144c0 ffff8800a344bcd0 ffff880092f939f0
[ 6001.278175]  0000000000000246 ffff880092f93800 0000000000000246 ffff88009db2e040
[ 6001.278186] Call Trace:
[ 6001.278210]  [<ffffffff814f0e1e>] ? mutex_unlock+0xe/0x10
[ 6001.278289]  [<ffffffffa044f3c4>] ? dbuf_rele_and_unlock+0x184/0x260 [zfs]
...
[ 6121.273626] INFO: task zfs:3610 blocked for more than 120 seconds.
[ 6121.273780]       Tainted: P           O 3.12.9-2-ARCH #1
[ 6121.273891] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6121.274048] zfs             D 0000000000000002     0  3610   3558 0x00000000
[ 6121.274061]  ffff88005e6bfd40 0000000000000086 00000000000144c0 ffff88005e6bffd8
[ 6121.274074]  ffff88005e6bffd8 00000000000144c0 ffff8800a344bcd0 ffff880092f939f0
[ 6121.274084]  0000000000000246 ffff880092f93800 0000000000000246 ffff88009db2e040
[ 6121.274095] Call Trace:
[ 6121.274119]  [<ffffffff814f0e1e>] ? mutex_unlock+0xe/0x10
[ 6121.274198]  [<ffffffffa044f3c4>] ? dbuf_rele_and_unlock+0x184/0x260 [zfs]
...

Most of the time I see the following stack for the process zfs_iput_taskq

cat /proc/437/stack
[<ffffffff814f47a6>] retint_kernel+0x26/0x30
[<ffffffffa032f70e>] kmem_alloc_debug+0x20e/0x500 [spl]
[<ffffffffa044fdab>] dbuf_hold_impl+0x7b/0xa0 [zfs]
[<ffffffffa044fea3>] dbuf_prefetch+0xd3/0x280 [zfs]
[<ffffffffa0464baf>] dmu_zfetch_dofetch.isra.5+0x10f/0x180 [zfs]
[<ffffffffa04654b7>] dmu_zfetch+0x5f7/0x10e0 [zfs]
[<ffffffffa044e4de>] dbuf_read+0x71e/0x8f0 [zfs]
[<ffffffffa04671be>] dnode_hold_impl+0x1ee/0x620 [zfs]
[<ffffffffa0467609>] dnode_hold+0x19/0x20 [zfs]
[<ffffffffa0456071>] dmu_object_info+0x21/0x50 [zfs]
[<ffffffffa04c29ed>] zfs_unlinked_drain+0x7d/0x120 [zfs]
[<ffffffffa0335ba7>] taskq_thread+0x237/0x4b0 [spl]
[<ffffffff81084ec0>] kthread+0xc0/0xd0
[<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

zpool status

  pool: data
 state: ONLINE
  scan: scrub canceled on Tue Feb 18 00:43:04 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ    ONLINE       0     0     0
            ata-Hitachi_HDS723030ALA640_MK  ONLINE       0     0     0
            ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ    ONLINE       0     0     0
            ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ    ONLINE       0     0     0

# arcstat.py 2
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c 
01:12:01     0     0      0     0    0     0    0     0    0   206M  798M 
01:12:03     2     0      0     0    0     0    0     0    0   206M  798M 
01:12:05     0     0      0     0    0     0    0     0    0   206M  798M 
01:12:07     0     0      0     0    0     0    0     0    0   206M  798M 
01:12:09     2     0      0     0    0     0    0     0    0   206M  798M 
01:12:11     0     0      0     0    0     0    0     0    0   206M  798M 
01:12:13     2     0      0     0    0     0    0     0    0   206M  798M

Until now I saw the issue 2 times.

gnubioIt · 2014-03-31T19:50:49Z

Hi,

Similar problem here. Running rdiff-backup of another host. Perhaps 100GB total.

Running on a Supermicro server w/ a single dual core Opteron CPU (details below) and 8GB of RAM. 16 2TB drives in two ZFS pools.

Ubuntu 12.04.4 LTS (64bit)

OUTPUT OF TOP:
top - 15:22:43 up 4 days, 30 min, 2 users, load average: 4.12, 4.23, 4.15
Tasks: 317 total, 3 running, 314 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 50.3%sy, 0.0%ni, 0.0%id, 49.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8175232k total, 7550748k used, 624484k free, 902200k buffers
Swap: 8388604k total, 5656k used, 8382948k free, 78476k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25585 root 0 -20 0 0 0 R 100 0.0 501:12.54 zfs_iput_taskq/
3575 root 20 0 17476 1432 916 S 0 0.0 0:02.74 top
3587 root 20 0 0 0 0 S 0 0.0 0:00.08 kworker/0:1

Strangely, htop does not show zfs_input_taskq

VERSIONS OF ZFS, et. Al. Installed via package manager:
root@backupsvr:/aoife/backups/svn/rdiff-backup-data# dpkg -l | grep zfs
ii libzfs1 0.6.2-1precise Native ZFS filesystem library for Linux
ii mountall 2.36.4-zfs2 filesystem mounting tool
ii ubuntu-zfs 7precise Native ZFS filesystem metapackage for Ubuntu.
ii zfs-dkms 0.6.2-1precise Native ZFS filesystem kernel modules for Linux
ii zfsutils 0.6.2-1precise Native ZFS management utilities for Linux

CPU INFO FROM PROC
root@backupsvr:/aoife/backups/svn/rdiff-backup-data# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2218
stepping : 2
microcode : 0x62
cpu MHz : 2613.242
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 5226.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

Second core would repeat above.

ZPOOL STATUS
root@backupsvr:~# zpool status
pool: aoife
state: ONLINE
scan: scrub repaired 0 in 7h42m with 0 errors on Mon Mar 24 18:51:26 2014
config:

    NAME                                 STATE     READ WRITE CKSUM
    aoife                                ONLINE       0     0     0
      raidz2-0                           ONLINE       0     0     0
        scsi-1AMCC_Z1Y0NWWY29E4F9000FE0  ONLINE       0     0     0
        scsi-1AMCC_Z1Y0M03P29E4F900040F  ONLINE       0     0     0
        scsi-1AMCC_Z1X12WTQ29E4F900025F  ONLINE       0     0     0
        scsi-1AMCC_Z1X12W4W29E4FE0004E2  ONLINE       0     0     0
        scsi-1AMCC_Z1Y0NMLS29E4FE001595  ONLINE       0     0     0
        scsi-1AMCC_Z1Y0PAK529E4FE000EED  ONLINE       0     0     0
        scsi-1AMCC_Z1X12VCK29E4FE000F0C  ONLINE       0     0     0
        scsi-1AMCC_Z1X12VEB29E4FE000220  ONLINE       0     0     0

errors: No known data errors

pool: fergus
state: ONLINE
scan: none requested
config:

    NAME                                 STATE     READ WRITE CKSUM
    fergus                               ONLINE       0     0     0
      raidz2-0                           ONLINE       0     0     0
        scsi-1AMCC_Z1Y0P1NK29E4F9000A8A  ONLINE       0     0     0
        scsi-1AMCC_Z1Y0NC4J29E4F900211B  ONLINE       0     0     0
        scsi-1AMCC_Z1Y0N6Z629E4FE00010C  ONLINE       0     0     0
        scsi-1AMCC_Z1Y0MTQD29E4FE000F07  ONLINE       0     0     0
        scsi-1AMCC_Z1X12X1X29E4FE0023B2  ONLINE       0     0     0
        scsi-1AMCC_Z1X12VKN29E4FE001EE3  ONLINE       0     0     0
        scsi-1AMCC_Z1X12VLM29E4FE00159F  ONLINE       0     0     0
        scsi-1AMCC_Z1X12X9G29E4FE001B6F  ONLINE       0     0     0

errors: No known data errors

Is there other info I should forward to you?

Thanks for all your hard work.

behlendorf · 2014-04-04T21:54:08Z

This looks like a duplicate of #1469. It's a known issue which hasn't yet been resolved, using SA based xattrs should minimize the issue.

zfs set xattr=sa tank/fs

DeHackEd · 2014-04-04T22:42:45Z

Interesting. I'll have to wait until Monday to check out my system. I thought xattrs were not in use but I don't recall explicitly disabling them either.

behlendorf · 2014-04-04T23:36:55Z

They're enabled by default. If you know they aren't needed explicitly disabling them should prevent the issue as well.

DeHackEd · 2014-04-09T15:22:58Z

# zpool history -i
2014-02-04.13:29:50 [txg:5] create pool version 5000; software version 5000/5; uts H264 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64
...
(receive snapshot remotely)
...
(make clones)
...
2014-02-04.13:37:00 [txg:385] set netlab (21) xattr=0
...

While the filesystem may have xattrs from the receive operation they do have the xattr mount option disabled. Since setting xattr=off was the last thing to happen maybe that makes sense? (If existing xattrs exist they would NOT have been 'sa' based.)

(History requires -i due to the pool being affected by an earlier bug)

This reverts commit 7973e46. That had been intended to workaround a deadlock issue involving zfs_zget(), which was fixed by 6f9548c. The workaround had the side effect of causing zfs_zinactive() to cause excessive cpu utilization in zfs_iput_taskq by queuing an iteration of all objects in a dataset on every unlink on a directory that had extended attributes. That resulted in many issue reports about iput_taskq spinning. Since the original rationale for the change is no longer valid, we can safely revert it to resolve all of those issue reports. Conflicts: module/zfs/zfs_dir.c Closes: openzfs#457 openzfs#2058 openzfs#2128 openzfs#2240

This reverts commit 7973e46 which brings the basic flow of the code back inline with the other ZFS implementations. This was possible due to the work done in these in previous commits. e89260a Directory xattr znodes hold a reference on their parent 26cb948 Avoid 128K kmem allocations in mzap_upgrade() 4acaaf7 Add zfs_iput_async() interface Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#2408 Issue openzfs#457 Issue openzfs#2058 Issue openzfs#2128 Issue openzfs#2240

This reverts commit 7973e46 which brings the basic flow of the code back in line with the other ZFS implementations. This was possible due to the following related changes. e89260a Directory xattr znodes hold a reference on their parent 6f9548c Fix deadlock in zfs_zget() 26cb948 Avoid 128K kmem allocations in mzap_upgrade() 4acaaf7 Add zfs_iput_async() interface Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#457 Issue openzfs#2058 Issue openzfs#2128 Issue openzfs#2240

This reverts commit 7973e46 which brings the basic flow of the code back in line with the other ZFS implementations. This was possible due to the following related changes. e89260a Directory xattr znodes hold a reference on their parent 6f9548c Fix deadlock in zfs_zget() ca043ca Add zfs_iput_async() interface dd2a794 Avoid 128K kmem allocations in mzap_upgrade() Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#457 Issue openzfs#2058 Issue openzfs#2128 Issue openzfs#2240

This reverts commit 7973e46 which brings the basic flow of the code back in line with the other ZFS implementations. This was possible due to the following related changes. e89260a Directory xattr znodes hold a reference on their parent 6f9548c Fix deadlock in zfs_zget() 0a50679 Add zfs_iput_async() interface 4dd1893 Avoid 128K kmem allocations in mzap_upgrade() Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#457 Closes openzfs#2058 Closes openzfs#2128 Closes openzfs#2240

This reverts commit 7973e46 which brings the basic flow of the code back in line with the other ZFS implementations. This was possible due to the following related changes. e89260a Directory xattr znodes hold a reference on their parent 6f9548c Fix deadlock in zfs_zget() ca043ca Add zfs_iput_async() interface dd2a794 Avoid 128K kmem allocations in mzap_upgrade() Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#457 Closes openzfs#2058 Closes openzfs#2128 Closes openzfs#2240

behlendorf added this to the 0.6.4 milestone Feb 14, 2014

behlendorf added the Bug label Feb 14, 2014

ryao mentioned this issue Jun 20, 2014

Revert "Revert "Revert "Fix unlink/xattr deadlock""" #2408

Closed

behlendorf closed this as completed in 0d5c500 Aug 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs_iput_taskq spinning again #2128

zfs_iput_taskq spinning again #2128

DeHackEd commented Feb 14, 2014

behlendorf commented Feb 14, 2014

BerndAmend commented Feb 18, 2014

gnubioIt commented Mar 31, 2014

behlendorf commented Apr 4, 2014

DeHackEd commented Apr 4, 2014

behlendorf commented Apr 4, 2014

DeHackEd commented Apr 9, 2014

zfs_iput_taskq spinning again #2128

zfs_iput_taskq spinning again #2128

Comments

DeHackEd commented Feb 14, 2014

Hardware

ZFS stats

Task info

Stack traces

Interesting supplemental information

behlendorf commented Feb 14, 2014

BerndAmend commented Feb 18, 2014

Hardware

zpool iostat

zpool status

gnubioIt commented Mar 31, 2014

behlendorf commented Apr 4, 2014

DeHackEd commented Apr 4, 2014

behlendorf commented Apr 4, 2014

DeHackEd commented Apr 9, 2014