Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock with hung tasks in kmalloc #446

Closed
dechamps opened this issue Nov 10, 2011 · 12 comments
Closed

Deadlock with hung tasks in kmalloc #446

dechamps opened this issue Nov 10, 2011 · 12 comments
Milestone

Comments

@dechamps
Copy link
Contributor

I'm running latest SPL/ZFS from master.

ZFS just deadlocked my server. This happened while it was doing quite a lot of things at the same time (most notably, rtorrent and a ZVOL-backed VirtualBox).

I have no idea what triggered it exactly. I noticed that my processes were getting deadlocked one after the other (hung and SIGKILL-proof). The box, however, stayed up and running (no panic), although most ZFS operations wouldn't complete. Basically I was still able to read the pool but it was impossible to write anything. I observed the phenomenon for a few minutes, then I rebooted the box. Needless to say, this will probably be difficult to reproduce.

What's much more interesting however is the kernel log:

[373562.701571] INFO: task kswapd0:45 blocked for more than 120 seconds.
[373562.703914] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373562.706260] kswapd0         D ffff880419e31850     0    45      2 0x00000000
[373562.708618]  ffff880419e31850 0000000000000046 0000000000000020 ffffffffa0109b25
[373562.710952]  ffff88041f2628b0 0000000000012800 ffff880419e33fd8 ffff880419e33fd8
[373562.713286]  0000000000012800 ffff880419e31850 0000000000012800 0000000000012800
[373562.715572] Call Trace:
[373562.717970]  [<ffffffffa0109b25>] ? kmalloc_nofail+0x18/0x26 [spl]
[373562.720182]  [<ffffffff81336f5c>] ? _raw_spin_lock_irqsave+0x9/0x25
[373562.722395]  [<ffffffffa010e002>] ? cv_wait_common+0x75/0xbc [spl]
[373562.724939]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23
[373562.727084]  [<ffffffff81335c55>] ? _cond_resched+0x9/0x20
[373562.729235]  [<ffffffffa01dd52d>] ? txg_wait_open+0x57/0x8d [zfs]
[373562.731415]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.733506]  [<ffffffffa01afaa1>] ? dmu_tx_assign+0xf1/0x329 [zfs]
[373562.735695]  [<ffffffffa01bac0b>] ? dsl_dataset_block_freeable+0x34/0x40 [zfs]
[373562.737766]  [<ffffffffa0201ed7>] ? zfs_inactive+0xa2/0x145 [zfs]
[373562.739766]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.741710]  [<ffffffff8110ede2>] ? evict+0x7b/0x119
[373562.743641]  [<ffffffff8110eeb2>] ? dispose_list+0x32/0x3c
[373562.745499]  [<ffffffff8110f324>] ? shrink_icache_memory+0x2a4/0x2d4
[373562.747484]  [<ffffffff810c2deb>] ? shrink_slab+0xe3/0x155
[373562.749661]  [<ffffffff810c571f>] ? balance_pgdat+0x2d1/0x57a
[373562.751656]  [<ffffffff810c5c7c>] ? kswapd+0x2b4/0x2cd
[373562.753428]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23
[373562.755174]  [<ffffffff810c59c8>] ? balance_pgdat+0x57a/0x57a
[373562.756912]  [<ffffffff810c59c8>] ? balance_pgdat+0x57a/0x57a
[373562.758544]  [<ffffffff8105fdc7>] ? kthread+0x7a/0x82
[373562.760175]  [<ffffffff8133d324>] ? kernel_thread_helper+0x4/0x10
[373562.761876]  [<ffffffff8105fd4d>] ? kthread_worker_fn+0x149/0x149
[373562.763475]  [<ffffffff8133d320>] ? gs_change+0x13/0x13


[373562.764983] INFO: task zvol/2:344 blocked for more than 120 seconds.
[373562.766497] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373562.767990] zvol/2          D ffff880419858080     0   344      2 0x00000000
[373562.769476]  ffff880419858080 0000000000000046 ffffffff8103fd52 ffff880418fb61c0
[373562.770958]  ffff88041f243610 0000000000012800 ffff880412183fd8 ffff880412183fd8
[373562.772392]  0000000000012800 ffff880419858080 0000000000012800 0000000000012800
[373562.773831] Call Trace:
[373562.775535]  [<ffffffff8103fd52>] ? mutex_spin_on_owner+0x23/0x40
[373562.776877]  [<ffffffff81336f5c>] ? _raw_spin_lock_irqsave+0x9/0x25
[373562.778336]  [<ffffffffa010e002>] ? cv_wait_common+0x75/0xbc [spl]
[373562.779682]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23
[373562.780972]  [<ffffffffa01dd52d>] ? txg_wait_open+0x57/0x8d [zfs]
[373562.782231]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.783525]  [<ffffffffa01afaa1>] ? dmu_tx_assign+0xf1/0x329 [zfs]
[373562.784744]  [<ffffffffa01bac0b>] ? dsl_dataset_block_freeable+0x34/0x40 [zfs]
[373562.785964]  [<ffffffffa0201ed7>] ? zfs_inactive+0xa2/0x145 [zfs]
[373562.787151]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.788287]  [<ffffffff8110ede2>] ? evict+0x7b/0x119
[373562.789394]  [<ffffffff8110eeb2>] ? dispose_list+0x32/0x3c
[373562.790516]  [<ffffffff8110f324>] ? shrink_icache_memory+0x2a4/0x2d4
[373562.791657]  [<ffffffff810c2deb>] ? shrink_slab+0xe3/0x155
[373562.792733]  [<ffffffff810c4eeb>] ? do_try_to_free_pages+0x1b7/0x315
[373562.793878]  [<ffffffff810c5318>] ? try_to_free_pages+0xb6/0xf6
[373562.794977]  [<ffffffff810bcd73>] ? __alloc_pages_nodemask+0x50b/0x7c1
[373562.796028]  [<ffffffff810e5e60>] ? alloc_pages_current+0xa5/0xbf
[373562.797072]  [<ffffffff810b96c7>] ? __get_free_pages+0x9/0x49
[373562.798149]  [<ffffffffa010a9d5>] ? spl_kmem_cache_alloc+0x13a/0x4d3 [spl]
[373562.799275]  [<ffffffffa020be03>] ? zil_alloc_lwb+0x24/0x106 [zfs]
[373562.800487]  [<ffffffffa020d20f>] ? zil_lwb_write_start+0x20d/0x2b2 [zfs]
[373562.801757]  [<ffffffffa020d9c5>] ? zil_commit+0x247/0x4f3 [zfs]
[373562.802873]  [<ffffffffa02163f1>] ? zvol_write+0x349/0x369 [zfs]
[373562.803942]  [<ffffffffa010b982>] ? taskq_thread+0x1d2/0x32c [spl]
[373562.804985]  [<ffffffff8103f0a4>] ? try_to_wake_up+0x199/0x199
[373562.806052]  [<ffffffffa010b7b0>] ? __taskq_create+0x373/0x373 [spl]
[373562.807144]  [<ffffffffa010b7b0>] ? __taskq_create+0x373/0x373 [spl]
[373562.808181]  [<ffffffff8105fdc7>] ? kthread+0x7a/0x82
[373562.809328]  [<ffffffff8133d324>] ? kernel_thread_helper+0x4/0x10
[373562.810382]  [<ffffffff8105fd4d>] ? kthread_worker_fn+0x149/0x149
[373562.811460]  [<ffffffff8133d320>] ? gs_change+0x13/0x13


[373562.812513] INFO: task txg_quiesce:895 blocked for more than 120 seconds.
[373562.813598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373562.814710] txg_quiesce     D ffff880412259060     0   895      2 0x00000000
[373562.815804]  ffff880412259060 0000000000000046 0000000000000082 ffff880400000000
[373562.816942]  ffff88041f2d7750 0000000000012800 ffff880408033fd8 ffff880408033fd8
[373562.818094]  0000000000012800 ffff880412259060 0000000000012800 0000000000012800
[373562.819304] Call Trace:
[373562.820444]  [<ffffffffa010e002>] ? cv_wait_common+0x75/0xbc [spl]
[373562.821627]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23
[373562.822812]  [<ffffffff81335c55>] ? _cond_resched+0x9/0x20
[373562.823977]  [<ffffffffa01de0db>] ? txg_quiesce_thread+0x18b/0x1e9 [zfs]
[373562.825224]  [<ffffffffa010b095>] ? __thread_create+0x129/0x129 [spl]
[373562.826728]  [<ffffffffa01ddf50>] ? txg_sync_thread+0x331/0x331 [zfs]
[373562.828325]  [<ffffffffa010b095>] ? __thread_create+0x129/0x129 [spl]
[373562.830050]  [<ffffffffa010b0ff>] ? thread_generic_wrapper+0x6a/0x75 [spl]
[373562.831372]  [<ffffffff8105fdc7>] ? kthread+0x7a/0x82
[373562.832448]  [<ffffffff8133d324>] ? kernel_thread_helper+0x4/0x10
[373562.833549]  [<ffffffff8105fd4d>] ? kthread_worker_fn+0x149/0x149
[373562.834627]  [<ffffffff8133d320>] ? gs_change+0x13/0x13


[373562.835706] INFO: task rtorrent:3380 blocked for more than 120 seconds.
[373562.836788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373562.837911] rtorrent        D ffff880418cbb5d0     0  3380   3371 0x00000000
[373562.839037]  ffff880418cbb5d0 0000000000000086 0000000000000020 ffffffffa0109b25
[373562.840243]  ffff880410c00a70 0000000000012800 ffff8803e1229fd8 ffff8803e1229fd8
[373562.841380]  0000000000012800 ffff880418cbb5d0 0000000000012800 0000000000012800
[373562.842543] Call Trace:
[373562.843716]  [<ffffffffa0109b25>] ? kmalloc_nofail+0x18/0x26 [spl]
[373562.844871]  [<ffffffff81336f5c>] ? _raw_spin_lock_irqsave+0x9/0x25
[373562.846076]  [<ffffffffa010e002>] ? cv_wait_common+0x75/0xbc [spl]
[373562.847233]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23
[373562.848386]  [<ffffffff81335c55>] ? _cond_resched+0x9/0x20
[373562.849670]  [<ffffffffa01dd52d>] ? txg_wait_open+0x57/0x8d [zfs]
[373562.850823]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.852037]  [<ffffffffa01afaa1>] ? dmu_tx_assign+0xf1/0x329 [zfs]
[373562.853233]  [<ffffffffa01bac0b>] ? dsl_dataset_block_freeable+0x34/0x40 [zfs]
[373562.854429]  [<ffffffffa0201ed7>] ? zfs_inactive+0xa2/0x145 [zfs]
[373562.855587]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.856829]  [<ffffffff8110ede2>] ? evict+0x7b/0x119
[373562.857987]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.859137]  [<ffffffff8110eeb2>] ? dispose_list+0x32/0x3c
[373562.860283]  [<ffffffff8110f324>] ? shrink_icache_memory+0x2a4/0x2d4
[373562.861435]  [<ffffffff810c2deb>] ? shrink_slab+0xe3/0x155
[373562.862606]  [<ffffffff810c4eeb>] ? do_try_to_free_pages+0x1b7/0x315
[373562.863789]  [<ffffffff810c5318>] ? try_to_free_pages+0xb6/0xf6
[373562.864954]  [<ffffffff810bcd73>] ? __alloc_pages_nodemask+0x50b/0x7c1
[373562.866135]  [<ffffffff810e5e60>] ? alloc_pages_current+0xa5/0xbf
[373562.867302]  [<ffffffff8102fac1>] ? pte_alloc_one+0x11/0x39
[373562.868464]  [<ffffffff811a6c22>] ? prio_tree_insert+0x27/0x239
[373562.869625]  [<ffffffff810d0ed0>] ? __pte_alloc+0x19/0x11c
[373562.870758]  [<ffffffff810d311e>] ? handle_mm_fault+0x1cf/0x22c
[373562.871984]  [<ffffffff81339fd0>] ? do_page_fault+0x2e9/0x30e
[373562.873124]  [<ffffffff810d7d76>] ? mmap_region+0x336/0x431
[373562.874238]  [<ffffffff810d5d56>] ? get_unmapped_area+0xe4/0x13e
[373562.875322]  [<ffffffff810fcf48>] ? fput+0x1a/0x1a2
[373562.876387]  [<ffffffff813375d5>] ? page_fault+0x25/0x30


[373562.877469] INFO: task nxagent:12517 blocked for more than 120 seconds.
[373562.878539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373562.879630] nxagent         D ffff8803f53468b0     0 12517      1 0x00000004
[373562.880741]  ffff8803f53468b0 0000000000000086 0000000000000020 ffffffffa0109b25
[373562.881885]  ffff880410c22040 0000000000012800 ffff88020b501fd8 ffff88020b501fd8
[373562.883035]  0000000000012800 ffff8803f53468b0 0000000000012800 0000000000012800
[373562.884224] Call Trace:
[373562.885357]  [<ffffffffa0109b25>] ? kmalloc_nofail+0x18/0x26 [spl]
[373562.886526]  [<ffffffff81336f5c>] ? _raw_spin_lock_irqsave+0x9/0x25
[373562.887872]  [<ffffffffa010e002>] ? cv_wait_common+0x75/0xbc [spl]
[373562.889048]  [<ffffffff8106023f>] ? wake_up_bit+0x23/0x23
[373562.890659]  [<ffffffff81335c55>] ? _cond_resched+0x9/0x20
[373562.891823]  [<ffffffffa01dd52d>] ? txg_wait_open+0x57/0x8d [zfs]
[373562.892944]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.894094]  [<ffffffffa01afaa1>] ? dmu_tx_assign+0xf1/0x329 [zfs]
[373562.895213]  [<ffffffffa01bac0b>] ? dsl_dataset_block_freeable+0x34/0x40 [zfs]
[373562.896336]  [<ffffffffa0201ed7>] ? zfs_inactive+0xa2/0x145 [zfs]
[373562.897452]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.898557]  [<ffffffff8110ede2>] ? evict+0x7b/0x119
[373562.899618]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.900686]  [<ffffffff8110eeb2>] ? dispose_list+0x32/0x3c
[373562.901751]  [<ffffffff8110f324>] ? shrink_icache_memory+0x2a4/0x2d4
[373562.902893]  [<ffffffff810c2deb>] ? shrink_slab+0xe3/0x155
[373562.903965]  [<ffffffff810c4eeb>] ? do_try_to_free_pages+0x1b7/0x315
[373562.905030]  [<ffffffff810c5318>] ? try_to_free_pages+0xb6/0xf6
[373562.906329]  [<ffffffff810bcd73>] ? __alloc_pages_nodemask+0x50b/0x7c1
[373562.907527]  [<ffffffff810e5e60>] ? alloc_pages_current+0xa5/0xbf
[373562.908681]  [<ffffffff812b1718>] ? tcp_sendmsg+0x37f/0x733
[373562.909765]  [<ffffffff81109d87>] ? __pollwait+0xd6/0xd6
[373562.910812]  [<ffffffff8126ce8e>] ? sock_aio_write+0xde/0xed
[373562.912208]  [<ffffffff81109a75>] ? set_fd_set+0x31/0x38
[373562.913283]  [<ffffffff810fbb9f>] ? do_sync_write+0xb1/0xea
[373562.914358]  [<ffffffff8103840a>] ? should_resched+0x5/0x24
[373562.915405]  [<ffffffff810383fc>] ? need_resched+0x1a/0x23
[373562.916444]  [<ffffffff8103b06a>] ? finish_task_switch+0x4c/0xaf
[373562.917502]  [<ffffffff810383fc>] ? need_resched+0x1a/0x23
[373562.918614]  [<ffffffff81335bf4>] ? __schedule+0x598/0x5af
[373562.919656]  [<ffffffff81165901>] ? security_file_permission+0x18/0x33
[373562.920702]  [<ffffffff810fc1c9>] ? vfs_write+0xb9/0xf9
[373562.921767]  [<ffffffff810fc3ab>] ? sys_write+0x45/0x6b
[373562.922806]  [<ffffffff8133c212>] ? system_call_fastpath+0x16/0x1b

What's interesting is that the rtorrent process was doing a direct memory reclaim (see stack trace), while most other tasks were spinning in kmalloc_nofail. Seems to me that the fix for the infamous #287 bug may be unveiling a new kind of issue.

@behlendorf
Copy link
Contributor

Interesting, so all the threads which entered direct reclaim are unfortunately stuck waiting to open the next txg. It would be interesting to see why txg_sync thread isn't able to move things along. I've seen similar issues to this very rarely when the arc is 1/2 full of dirty data. Threads attempting to manipulate the txg keep getting ERESTART back and txg_sync for some reason never makes forward progress.

@greg-fischer
Copy link

I am sure this is no help, but I wanted to comment at least. I have had the same symptoms while running VirtualBox on ZFS. For a while, I couldn't even have a ZFS pool mounted while VirtualBox was running a VM. I ended up keeping the VM on Ext4 and turning off all ZFS pools until I needed them, then switching VM off to enable ZFS. It would just deadlock the server just as etienne-dechamps-o posted. However, now, still running rc5 on Ubuntu, I have all my ZFS mounted, but nothing accessing them, VM running, and no lockups in over a week. I'll probably upgrade to rc6 this weekend and try the VM on there again. If there's anything I can do to help resolve this bug, let me know.

For the record... Big THANK YOU to all working on this project!! I love ZFS! I love it even more on Linux!

@Bacto
Copy link

Bacto commented Nov 12, 2011

Hi,

I'm new with ZFS on Linux. First of all, I'd like to say a big thank you for your work Brian !
It's a very great work.

I use a backup server to test ZFS. It's an Ubuntu 10.04.3 LTS with a 3.1.1 kernel and the last ZFS / SPL code (downloaded yesterday from GIT). I use raidz on 4 hdd of 3To each.
I made a lot of operations in parallel to stress the server and have some hung panic like Etienne I think.

Here is the last :

Nov 11 23:22:22 nsXX kernel: kswapd0 D 0000000000000000 0 748 2 0x00000000
Nov 11 23:22:22 nsXX kernel: ffff8801377dfa40 0000000000000046 ffff880100000000 ffff8801377d8980
Nov 11 23:22:22 nsXX kernel: ffff8801377dffd8 ffff8801377de010 0000000000004000 ffff8801377dffd8
Nov 11 23:22:22 nsXX kernel: ffff880138f027c0 ffff8801377d8980 0000000000000000 0000000000000005
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? free_page_list+0xfc/0x110
Nov 11 23:22:22 nsXX kernel: [] ? cpumask_any_but+0x2a/0x40
Nov 11 23:22:22 nsXX kernel: [] ? flush_tlb_page+0x43/0x90
Nov 11 23:22:22 nsXX kernel: [] ? dbuf_rele_and_unlock+0x141/0x200 [zfs]
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] __mutex_lock_slowpath+0xec/0x160
Nov 11 23:22:22 nsXX kernel: [] mutex_lock+0x1e/0x40
Nov 11 23:22:22 nsXX kernel: [] zfs_zinactive+0x6f/0x100 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_inactive+0x59/0x1a0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? truncate_pagecache+0x58/0x70
Nov 11 23:22:22 nsXX kernel: [] zpl_evict_inode+0x23/0x30 [zfs]
Nov 11 23:22:22 nsXX kernel: [] evict+0xa1/0x1a0
Nov 11 23:22:22 nsXX kernel: [] dispose_list+0x47/0x60
Nov 11 23:22:22 nsXX kernel: [] prune_icache_sb+0x18d/0x360
Nov 11 23:22:22 nsXX kernel: [] prune_super+0x14b/0x1a0
Nov 11 23:22:22 nsXX kernel: [] shrink_slab+0x15c/0x1d0
Nov 11 23:22:22 nsXX kernel: [] balance_pgdat+0x4f2/0x810
Nov 11 23:22:22 nsXX kernel: [] kswapd+0x1ff/0x310
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] ? balance_pgdat+0x810/0x810
Nov 11 23:22:22 nsXX kernel: [] kthread+0x96/0xa0
Nov 11 23:22:22 nsXX kernel: [] kernel_thread_helper+0x4/0x10
Nov 11 23:22:22 nsXX kernel: [] ? kthread_worker_fn+0x180/0x180
Nov 11 23:22:22 nsXX kernel: [] ? gs_change+0xb/0xb
Nov 11 23:22:22 nsXX kernel: cron D 0000000000000000 0 2981 1 0x00000000
Nov 11 23:22:22 nsXX kernel: ffff8801306e9910 0000000000000082 ffff880000000000 ffff880135772940
Nov 11 23:22:22 nsXX kernel: ffff8801306e9fd8 ffff8801306e8010 0000000000004000 ffff8801306e9fd8
Nov 11 23:22:22 nsXX kernel: ffffffff81e0d020 ffff880135772940 ffff88013b058f00 0000000000000003
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? free_page_list+0xfc/0x110
Nov 11 23:22:22 nsXX kernel: [] ? __mem_cgroup_uncharge_common+0xcc/0x1e0
Nov 11 23:22:22 nsXX kernel: [] ? dbuf_rele_and_unlock+0x141/0x200 [zfs]
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] __mutex_lock_slowpath+0xec/0x160
Nov 11 23:22:22 nsXX kernel: [] mutex_lock+0x1e/0x40
Nov 11 23:22:22 nsXX kernel: [] zfs_zinactive+0x6f/0x100 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_inactive+0x59/0x1a0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? truncate_pagecache+0x58/0x70
Nov 11 23:22:22 nsXX kernel: [] zpl_evict_inode+0x23/0x30 [zfs]
Nov 11 23:22:22 nsXX kernel: [] evict+0xa1/0x1a0
Nov 11 23:22:22 nsXX kernel: [] dispose_list+0x47/0x60
Nov 11 23:22:22 nsXX kernel: [] prune_icache_sb+0x18d/0x360
Nov 11 23:22:22 nsXX kernel: [] prune_super+0x14b/0x1a0
Nov 11 23:22:22 nsXX kernel: [] shrink_slab+0x15c/0x1d0
Nov 11 23:22:22 nsXX kernel: [] do_try_to_free_pages+0x381/0x460
Nov 11 23:22:22 nsXX kernel: [] try_to_free_pages+0x71/0x80
Nov 11 23:22:22 nsXX kernel: [] __alloc_pages_nodemask+0x43c/0x750
Nov 11 23:22:22 nsXX kernel: [] copy_process+0xf9/0x1290
Nov 11 23:22:22 nsXX kernel: [] do_fork+0x9c/0x2a0
Nov 11 23:22:22 nsXX kernel: [] ? sys_newstat+0x31/0x50
Nov 11 23:22:22 nsXX kernel: [] sys_clone+0x23/0x30
Nov 11 23:22:22 nsXX kernel: [] stub_clone+0x13/0x20
Nov 11 23:22:22 nsXX kernel: [] ? system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: txg_quiesce D 0000000000000006 0 3353 2 0x00000000
Nov 11 23:22:22 nsXX kernel: ffff880115c93d80 0000000000000046 ffff880115c93fd8 ffff88013095b2c0
Nov 11 23:22:22 nsXX kernel: ffff880115c93fd8 ffff880115c92010 0000000000004000 ffff880115c93fd8
Nov 11 23:22:22 nsXX kernel: ffff880138f54880 ffff88013095b2c0 0000000000000000 ffffffff00000005
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? sched_clock_cpu+0xc5/0x100
Nov 11 23:22:22 nsXX kernel: [] ? try_to_wake_up+0xc7/0x270
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] cv_wait_common+0x73/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] __cv_wait+0xe/0x10 [spl]
Nov 11 23:22:22 nsXX kernel: [] txg_quiesce_thread+0x18b/0x220 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? txg_sync_thread+0x3b0/0x3b0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? txg_sync_thread+0x3b0/0x3b0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? __thread_create+0x160/0x160 [spl]
Nov 11 23:22:22 nsXX kernel: [] thread_generic_wrapper+0x73/0x90 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? __thread_create+0x160/0x160 [spl]
Nov 11 23:22:22 nsXX kernel: [] kthread+0x96/0xa0
Nov 11 23:22:22 nsXX kernel: [] kernel_thread_helper+0x4/0x10
Nov 11 23:22:22 nsXX kernel: [] ? kthread_worker_fn+0x180/0x180
Nov 11 23:22:22 nsXX kernel: [] ? gs_change+0xb/0xb
Nov 11 23:22:22 nsXX kernel: bash D 0000000000000000 0 3857 3836 0x00000000
Nov 11 23:22:22 nsXX kernel: ffff8801223b1af8 0000000000000086 0000000000000000 ffff8801340d8380
Nov 11 23:22:22 nsXX kernel: ffff8801223b1fd8 ffff8801223b0010 0000000000004000 ffff8801223b1fd8
Nov 11 23:22:22 nsXX kernel: ffff880138ec8740 ffff8801340d8380 ffff880058851800 00000000a00d0b19
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? dbuf_rele_and_unlock+0x141/0x200 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dmu_buf_rele+0x2b/0x40 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dnode_hold_impl+0x2af/0x560 [zfs]
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] cv_wait_common+0x73/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] __cv_wait+0xe/0x10 [spl]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_wait+0x83/0xf0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_create+0x287/0x700 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? tsd_exit+0x3c/0x1a0 [spl]
Nov 11 23:22:22 nsXX kernel: [] zpl_create+0xa2/0xe0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_create+0x8c/0xc0
Nov 11 23:22:22 nsXX kernel: [] do_last+0x602/0x810
Nov 11 23:22:22 nsXX kernel: [] path_openat+0xd0/0x400
Nov 11 23:22:22 nsXX kernel: [] do_filp_open+0x44/0xa0
Nov 11 23:22:22 nsXX kernel: [] ? alloc_fd+0x4b/0x140
Nov 11 23:22:22 nsXX kernel: [] do_sys_open+0x102/0x1e0
Nov 11 23:22:22 nsXX kernel: [] sys_open+0x1b/0x20
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: cp D 0000000000000000 0 4101 1 0x00000004
Nov 11 23:22:22 nsXX kernel: ffff88002f6fdcb8 0000000000000082 ffff880000000000 ffff880138867840
Nov 11 23:22:22 nsXX kernel: ffff88002f6fdfd8 ffff88002f6fc010 0000000000004000 ffff88002f6fdfd8
Nov 11 23:22:22 nsXX kernel: ffffffff81e0d020 ffff880138867840 0000000000000000 ffffffff00000006
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? try_to_wake_up+0xc7/0x270
Nov 11 23:22:22 nsXX kernel: [] ? default_wake_function+0xd/0x10
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] cv_wait_common+0x73/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] __cv_wait+0xe/0x10 [spl]
Nov 11 23:22:22 nsXX kernel: [] txg_wait_open+0x73/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_mkdir+0x1a3/0x580 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? d_lookup+0x30/0x50
Nov 11 23:22:22 nsXX kernel: [] zpl_mkdir+0x99/0xd0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_mkdir+0x82/0xb0
Nov 11 23:22:22 nsXX kernel: [] sys_mkdirat+0xbb/0xd0
Nov 11 23:22:22 nsXX kernel: [] sys_mkdir+0x13/0x20
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: cp D 0000000000000006 0 4104 1 0x00000004
Nov 11 23:22:22 nsXX kernel: ffff88002f703b78 0000000000000082 ffff88002f703a88 ffff8801377e29c0
Nov 11 23:22:22 nsXX kernel: ffff88002f703fd8 ffff88002f702010 0000000000004000 ffff88002f703fd8
Nov 11 23:22:22 nsXX kernel: ffff880138f54880 ffff8801377e29c0 0000000000000000 ffff88003367da80
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? kmem_free_debug+0x11/0x20 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? dsl_dir_tempreserve_clear+0xe5/0x110 [zfs]
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] cv_wait_common+0x73/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] __cv_wait+0xe/0x10 [spl]
Nov 11 23:22:22 nsXX kernel: [] txg_wait_open+0x73/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_write+0x3ae/0xc80 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? do_sync_read+0xd2/0x110
Nov 11 23:22:22 nsXX kernel: [] ? tsd_hash_search+0x76/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] zpl_write_common+0x4d/0x70 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zpl_write+0x64/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_write+0xc8/0x190
Nov 11 23:22:22 nsXX kernel: [] sys_write+0x4c/0x90
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: cp D 0000000000000000 0 4109 1 0x00000004
Nov 11 23:22:22 nsXX kernel: ffff88002f70d348 0000000000000082 ffff88013b00f910 ffff8801350b9840
Nov 11 23:22:22 nsXX kernel: ffff88002f70dfd8 ffff88002f70c010 0000000000004000 ffff88002f70dfd8
Nov 11 23:22:22 nsXX kernel: ffff8801340d9700 ffff8801350b9840 0000000000000000 0000000000000000
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? submit_bio+0x73/0xf0
Nov 11 23:22:22 nsXX kernel: [] ? test_set_page_writeback+0x115/0x190
Nov 11 23:22:22 nsXX kernel: [] ? dbuf_rele_and_unlock+0x141/0x200 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dmu_buf_rele+0x2b/0x40 [zfs]
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] __mutex_lock_slowpath+0xec/0x160
Nov 11 23:22:22 nsXX kernel: [] mutex_lock+0x1e/0x40
Nov 11 23:22:22 nsXX kernel: [] zfs_zinactive+0x6f/0x100 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_inactive+0x59/0x1a0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? truncate_pagecache+0x58/0x70
Nov 11 23:22:22 nsXX kernel: [] zpl_evict_inode+0x23/0x30 [zfs]
Nov 11 23:22:22 nsXX kernel: [] evict+0xa1/0x1a0
Nov 11 23:22:22 nsXX kernel: [] dispose_list+0x47/0x60
Nov 11 23:22:22 nsXX kernel: [] prune_icache_sb+0x18d/0x360
Nov 11 23:22:22 nsXX kernel: [] prune_super+0x14b/0x1a0
Nov 11 23:22:22 nsXX kernel: [] shrink_slab+0x15c/0x1d0
Nov 11 23:22:22 nsXX kernel: [] do_try_to_free_pages+0x381/0x460
Nov 11 23:22:22 nsXX kernel: [] try_to_free_pages+0x71/0x80
Nov 11 23:22:22 nsXX kernel: [] __alloc_pages_nodemask+0x43c/0x750
Nov 11 23:22:22 nsXX kernel: [] alloc_pages_current+0xa0/0x110
Nov 11 23:22:22 nsXX kernel: [] __get_free_pages+0x9/0x50
Nov 11 23:22:22 nsXX kernel: [] kv_alloc+0x36/0xb0 [spl]
Nov 11 23:22:22 nsXX kernel: [] spl_kmem_cache_alloc+0x353/0x6b0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? kmem_free_debug+0x11/0x20 [spl]
Nov 11 23:22:22 nsXX kernel: [] dbuf_create+0x3e/0x3b0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dbuf_create_bonus+0x21/0x30 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_bonus_hold+0x188/0x2a0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] sa_buf_hold+0x9/0x10 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_mknode+0x14b/0xc70 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? txg_rele_to_quiesce+0xc/0x10 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dmu_tx_assign+0x351/0x410 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_create+0x59a/0x700 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? tsd_exit+0x3c/0x1a0 [spl]
Nov 11 23:22:22 nsXX kernel: [] zpl_create+0xa2/0xe0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_create+0x8c/0xc0
Nov 11 23:22:22 nsXX kernel: [] do_last+0x602/0x810
Nov 11 23:22:22 nsXX kernel: [] path_openat+0xd0/0x400
Nov 11 23:22:22 nsXX kernel: [] ? user_path_at_empty+0x65/0xa0
Nov 11 23:22:22 nsXX kernel: [] do_filp_open+0x44/0xa0
Nov 11 23:22:22 nsXX kernel: [] ? alloc_fd+0x4b/0x140
Nov 11 23:22:22 nsXX kernel: [] do_sys_open+0x102/0x1e0
Nov 11 23:22:22 nsXX kernel: [] sys_open+0x1b/0x20
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: cp D 0000000000000006 0 4112 1 0x00000004
Nov 11 23:22:22 nsXX kernel: ffff88002f713348 0000000000000086 ffff88013b0eab40 ffff880137400180
Nov 11 23:22:22 nsXX kernel: ffff88002f713fd8 ffff88002f712010 0000000000004000 ffff88002f713fd8
Nov 11 23:22:22 nsXX kernel: ffff8801350b8b40 ffff880137400180 ffffea000414a9e8 0000000000000009
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? cpumask_any_but+0x2a/0x40
Nov 11 23:22:22 nsXX kernel: [] ? flush_tlb_page+0x43/0x90
Nov 11 23:22:22 nsXX kernel: [] ? dbuf_rele_and_unlock+0x141/0x200 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dmu_buf_rele+0x2b/0x40 [zfs]
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] __mutex_lock_slowpath+0xec/0x160
Nov 11 23:22:22 nsXX kernel: [] mutex_lock+0x1e/0x40
Nov 11 23:22:22 nsXX kernel: [] zfs_zinactive+0x6f/0x100 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_inactive+0x59/0x1a0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? truncate_pagecache+0x58/0x70
Nov 11 23:22:22 nsXX kernel: [] zpl_evict_inode+0x23/0x30 [zfs]
Nov 11 23:22:22 nsXX kernel: [] evict+0xa1/0x1a0
Nov 11 23:22:22 nsXX kernel: [] dispose_list+0x47/0x60
Nov 11 23:22:22 nsXX kernel: [] prune_icache_sb+0x18d/0x360
Nov 11 23:22:22 nsXX kernel: [] prune_super+0x14b/0x1a0
Nov 11 23:22:22 nsXX kernel: [] shrink_slab+0x15c/0x1d0
Nov 11 23:22:22 nsXX kernel: [] do_try_to_free_pages+0x381/0x460
Nov 11 23:22:22 nsXX kernel: [] try_to_free_pages+0x71/0x80
Nov 11 23:22:22 nsXX kernel: [] __alloc_pages_nodemask+0x43c/0x750
Nov 11 23:22:22 nsXX kernel: [] alloc_pages_current+0xa0/0x110
Nov 11 23:22:22 nsXX kernel: [] __get_free_pages+0x9/0x50
Nov 11 23:22:22 nsXX kernel: [] kv_alloc+0x36/0xb0 [spl]
Nov 11 23:22:22 nsXX kernel: [] spl_kmem_cache_alloc+0x353/0x6b0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? avl_insert+0xa7/0xd0 [zavl]
Nov 11 23:22:22 nsXX kernel: [] ? mze_insert+0x68/0x80 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dbuf_create+0x3e/0x3b0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? zap_unlockdir+0x3c/0xb0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] __dbuf_hold_impl+0x241/0x480 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dbuf_hold_impl+0x7f/0xb0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dbuf_hold_level+0x1a/0x30 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_check_ioerr+0x45/0x100 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_count_write+0xa3/0x6c0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dbuf_rele_and_unlock+0x141/0x200 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? dnode_hold_impl+0x2af/0x560 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? __kmalloc+0x139/0x140
Nov 11 23:22:22 nsXX kernel: [] ? kmem_alloc_debug+0xab/0x120 [spl]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_hold_write+0x4a/0x60 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_write+0x37d/0xc80 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? do_sync_read+0xd2/0x110
Nov 11 23:22:22 nsXX kernel: [] ? tsd_hash_search+0x76/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] zpl_write_common+0x4d/0x70 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zpl_write+0x64/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_write+0xc8/0x190
Nov 11 23:22:22 nsXX kernel: [] sys_write+0x4c/0x90
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: cp D 0000000000000006 0 4116 1 0x00000004
Nov 11 23:22:22 nsXX kernel: ffff88002f72fb78 0000000000000086 ffff88002f72fa88 ffff880133a9f0c0
Nov 11 23:22:22 nsXX kernel: ffff88002f72ffd8 ffff88002f72e010 0000000000004000 ffff88002f72ffd8
Nov 11 23:22:22 nsXX kernel: ffff880138f54880 ffff880133a9f0c0 0000000000000006 ffff880000000002
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? try_to_wake_up+0xc7/0x270
Nov 11 23:22:22 nsXX kernel: [] ? default_wake_function+0xd/0x10
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] cv_wait_common+0x73/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] __cv_wait+0xe/0x10 [spl]
Nov 11 23:22:22 nsXX kernel: [] txg_wait_open+0x73/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_write+0x3ae/0xc80 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? do_sync_read+0xd2/0x110
Nov 11 23:22:22 nsXX kernel: [] ? tsd_hash_search+0x76/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] zpl_write_common+0x4d/0x70 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zpl_write+0x64/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_write+0xc8/0x190
Nov 11 23:22:22 nsXX kernel: [] sys_write+0x4c/0x90
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b
Nov 11 23:22:22 nsXX kernel: cp D 0000000000000004 0 4119 1 0x00000004
Nov 11 23:22:22 nsXX kernel: ffff88002f735b78 0000000000000086 ffff88002f735a88 ffff88013090f7c0
Nov 11 23:22:22 nsXX kernel: ffff88002f735fd8 ffff88002f734010 0000000000004000 ffff88002f735fd8
Nov 11 23:22:22 nsXX kernel: ffff880138f14800 ffff88013090f7c0 0000000000000000 ffff880100000006
Nov 11 23:22:22 nsXX kernel: Call Trace:
Nov 11 23:22:22 nsXX kernel: [] ? try_to_wake_up+0xc7/0x270
Nov 11 23:22:22 nsXX kernel: [] ? default_wake_function+0xd/0x10
Nov 11 23:22:22 nsXX kernel: [] schedule+0x3a/0x60
Nov 11 23:22:22 nsXX kernel: [] cv_wait_common+0x73/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] ? wake_up_bit+0x40/0x40
Nov 11 23:22:22 nsXX kernel: [] __cv_wait+0xe/0x10 [spl]
Nov 11 23:22:22 nsXX kernel: [] txg_wait_open+0x73/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zfs_write+0x3ae/0xc80 [zfs]
Nov 11 23:22:22 nsXX kernel: [] ? do_sync_read+0xd2/0x110
Nov 11 23:22:22 nsXX kernel: [] ? tsd_hash_search+0x76/0xd0 [spl]
Nov 11 23:22:22 nsXX kernel: [] zpl_write_common+0x4d/0x70 [zfs]
Nov 11 23:22:22 nsXX kernel: [] zpl_write+0x64/0xa0 [zfs]
Nov 11 23:22:22 nsXX kernel: [] vfs_write+0xc8/0x190
Nov 11 23:22:22 nsXX kernel: [] sys_write+0x4c/0x90
Nov 11 23:22:22 nsXX kernel: [] system_call_fastpath+0x16/0x1b

I can reproduce it when I stress the server in less of 3 hours of work so if you made a patch, I would test it very fast.
If you want, I'll grab you an access to the server on monday.

Hope I can help...

Adrien

@behlendorf
Copy link
Contributor

Thanks Adrien, hopefully your extra debugging with help us get this issue resolved.

@akorn
Copy link
Contributor

akorn commented Dec 26, 2011

Adrien,

I'd like to try to reproduce this. Can you tell me what your stress test does?

Thanks.

@Bacto
Copy link

Bacto commented Jan 12, 2012

Hi Akorn,

Sorry for the delay.

The test was 4 rsync between an SSD drive and the ZFS pool and a lot of copy of kernel sources (lot of small files) in parallels.

Now, I have this logs with only one rsync from ZFS to an other drive (reiserfs), after 657Go that was copied.
I don't know if it's the same bug or not.

INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb
INFO: task txg_sync:3397 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D 00000001008674a6     0  3397      2 0x00000000
 ffff88021d729bf0 0000000000000046 ffff880200000000 ffff880230224bf0
 ffff88021d729fd8 ffff88021d728010 0000000000004000 ffff88021d729fd8
 ffff880232d287f0 ffff880230224bf0 ffff88021d729b70 ffffffff810811f2
Call Trace:
 [<ffffffff810811f2>] ? try_to_wake_up+0x182/0x2a0
 [<ffffffff8108131d>] ? default_wake_function+0xd/0x10
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00089c3>] cv_wait_common+0x73/0xd0 [spl]
 [<ffffffff810a2fd0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa0008a3e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa01afdfb>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa014cf6a>] dsl_pool_sync+0xca/0x460 [zfs]
 [<ffffffffa0121ec7>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa015e3b6>] spa_sync+0x396/0x9a0 [zfs]
 [<ffffffff810a2fe1>] ? autoremove_wake_function+0x11/0x40
 [<ffffffff81077e49>] ? __wake_up_common+0x59/0x90
 [<ffffffff81078d0e>] ? __wake_up+0x4e/0x70
 [<ffffffffa016f0b4>] txg_sync_thread+0x224/0x3b0 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa016ee90>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0004ed0>] ? __thread_create+0x160/0x160 [spl]
 [<ffffffffa0004f43>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffff810a2b26>] kthread+0x96/0xa0
 [<ffffffff8183d094>] kernel_thread_helper+0x4/0x10
 [<ffffffff810a2a90>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8183d090>] ? gs_change+0xb/0xb

Adrien

@akorn
Copy link
Contributor

akorn commented Jan 13, 2012

Sorry, meanwhile the box I was using for experiments went live; I don't have a sandbox at the moment.

Andras

@ryao
Copy link
Contributor

ryao commented Apr 19, 2012

It looks like pull request #669 should address the original issue reported here.

@behlendorf
Copy link
Contributor

I'm not so sure. Certainly it would prevent the reclaim under the zvol_write() but that should be safe under most circumstances (except for a swap device). The real question is why the txg_sync thread isn't making head way.

This issue also feels a bit stale. Has anyone hit this recently or shall we just close it an open a new one if this is observed again.

@dechamps
Copy link
Contributor Author

This issue also feels a bit stale. Has anyone hit this recently or shall we just close it an open a new one if this is observed again.

Well, I only encountered this bug once. After that I stopped using VirtualBox so I have no idea how to reproduce it or even if the bug is still there. So feel free to close this until someone stumble upon the issue again.

@Bacto
Copy link

Bacto commented Apr 20, 2012

I had the bug a lot of times on my backup server, every 12 hours, so I stopped using ZFS :-(

I can try using it again, but I need some time to buy new hard drives.

I'll keep you in touch.

@behlendorf
Copy link
Contributor

Alright. Then I'm going to close this issue for now. I'm sure someone with open a new issue if this remains a problem.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 21, 2018
Avoid deadlocks when entering the shrinker from a PF_FSTRANS context.

This patch also reverts commit d0d5dd7 which added MUTEX_FSTRANS.  Its
use has been deprecated within ZFS as it was an ineffective mechanism
to eliminate deadlocks.  Among other things, it introduced the need for
strict ordering of mutex locking and unlocking in order that the
PF_FSTRANS flag wouldn't set incorrectly.

Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#446
mmaybee pushed a commit to mmaybee/openzfs that referenced this issue Sep 16, 2021
DOSE-400 Zfs_rename
DOSE-405 Zfs_share
DOSE-410 Zfs_unshare
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Jun 27, 2022
Since the `response_type` is the same as the `request_type`, the server
infrastructure can fill it in automatically, so each request handler
doesn't need to match it up manually.

The "get pools" response contained the pools directly in the nvlist, and
asserted that there are no other nvpairs.  There was no `response_type`
in this response message.  This commit changes it so that "get pools"
includes a `response_type`, and the pools are under a different `pools`
nvpair.  Additionally, the code in `PublicConnectionState::get_pools()`
is reorganized to be more clear.

Additionally, the code in `PublicConnectionState::get_destroying_pools()`
is simplified, using the server helper method to convert to an nvlist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants