Use UMH_WAIT_EXEC with call_usermodehelper() to reliably set elevator #9422

tonynguien · 2019-10-07T15:49:56Z

Motivation and Context

The change reliably sets device elevator to noop.

Description

On Ubuntu 18.04, disk elevator is no longer set to noop with the
change to use UMH_NO_WAIT flag. This change uses UMH_WAIT_EXEC flag,
wait for exec, and disk elevator is now reliably set on pool creation
and reboot.

How Has This Been Tested?

On Ubuntu 18.04, where disk elevator is no longer set to noop on pool creation or reboot (100% reproducible), we no longer see the issue this change.

I also ran zpool scrub several times and didn't hit panic mentioned in #8664.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

On Ubuntu 18.04, disk elevator is no longer set to `noop` with the change to use UMH_NO_WAIT flag. This change uses UMH_WAIT_EXEC flag, wait for exec, and disk elevator is now reliably set on pool creation and reboot. Signed-off-by: Tony Nguyen <[email protected]>

behlendorf

Using UMB_WAIT_EXEC was something which we originally considered. However, in the end we opted for UMH_NO_WAIT since technically we could hit the same deadlock when executing /bin/sh. ~~Though that's probably astonishingly unlikely in practice.~~

we could probably even use UMH_NO_WAIT just to be safe in case reading /bin/sh from disk triggers the same issue: i haven't been able to reproduce this particular case though.

I'm curious, were you able to determine why UMH_NO_WAIT does not work as expected in your environment? Aside from the wait behavior I thought these would behave the same.

That all said, since this does restore the intended behavior it LGTM. However, be aware that the intention is to retire this functionality in the master branch in the not to distant future. See #9317. Due to issues like this it's preferable to delegate setting the scheduler to some infrastructure in user space.

loli10K · 2019-10-07T20:51:13Z

I managed to reproduce #8664 on my root-on-zfs test VM with this change applied:

root@linux:~# cat /proc/sys/kernel/spl/gitrev
zfs-0.8.0-309-gff214b82a
root@linux:~# sysctl -w kernel.hung_task_timeout_secs=60
kernel.hung_task_timeout_secs = 60
root@linux:~# zfs set primarycache=none rpool
root@linux:~# echo 3 > /proc/sys/vm/drop_caches 
[  323.265982] bash (1453): drop_caches: 3
root@linux:~# 
root@linux:~# 
root@linux:~# time zpool scrub rpool
[  445.680923] INFO: task txg_sync:534 blocked for more than 61 seconds.
[  445.683620]       Tainted: P           OE     5.2.14 #1
[  445.685922] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.689250] txg_sync        D    0   534      2 0x80004000
[  445.691255] Call Trace:
[  445.691749]  ? __schedule+0x24a/0x600
[  445.692437]  schedule+0x2f/0xa0
[  445.693076]  cv_wait_common+0x156/0x290 [spl]
[  445.693943]  ? do_wait_intr_irq+0xb0/0xb0
[  445.694802]  spa_config_enter+0x13b/0x1e0 [zfs]
[  445.695729]  spa_sync+0x1e7/0x1850 [zfs]
[  445.696556]  ? zfs_refcount_remove_many+0x1e9/0x270 [zfs]
[  445.697604]  ? _cond_resched+0x15/0x30
[  445.698350]  ? mutex_lock+0xe/0x30
[  445.699074]  ? spa_txg_history_set+0xde/0x130 [zfs]
[  445.700067]  ? spa_txg_history_init_io+0xdc/0x110 [zfs]
[  445.701140]  txg_sync_thread+0x2ce/0x550 [zfs]
[  445.702053]  ? txg_thread_exit.isra.7+0xe0/0xe0 [zfs]
[  445.703014]  ? __thread_exit+0x20/0x20 [spl]
[  445.703831]  thread_generic_wrapper+0x9f/0xe0 [spl]
[  445.704803]  kthread+0x116/0x130
[  445.705446]  ? __kthread_cancel_work+0x80/0x80
[  445.706303]  ret_from_fork+0x35/0x40
[  445.707013] INFO: task systemd-timesyn:887 blocked for more than 61 seconds.
[  445.708382]       Tainted: P           OE     5.2.14 #1
[  445.709450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.710977] systemd-timesyn D    0   887      1 0x00000120
[  445.712074] Call Trace:
[  445.712576]  ? __schedule+0x24a/0x600
[  445.713318]  ? mutex_lock+0xe/0x30
[  445.714012]  schedule+0x2f/0xa0
[  445.714656]  cv_wait_common+0x156/0x290 [spl]
[  445.715539]  ? do_wait_intr_irq+0xb0/0xb0
[  445.716388]  dbuf_read+0x26f/0x600 [zfs]
[  445.717235]  dmu_buf_hold+0x56/0x80 [zfs]
[  445.718113]  zap_lockdir+0x54/0x120 [zfs]
[  445.718927]  ? stack_trace_save+0x4b/0x70
[  445.719762]  zap_lookup_norm+0x5d/0xd0 [zfs]
[  445.720672]  zap_lookup+0x12/0x20 [zfs]
[  445.721508]  zfs_dirent_lock+0x565/0x7d0 [zfs]
[  445.722425]  zfs_dirlook+0x8a/0x2c0 [zfs]
[  445.723263]  zfs_lookup+0x1ef/0x3b0 [zfs]
[  445.724124]  zpl_lookup+0xbd/0x280 [zfs]
[  445.724919]  __lookup_slow+0x97/0x150
[  445.725650]  lookup_slow+0x35/0x50
[  445.726307]  walk_component+0x1bf/0x2f0
[  445.727049]  path_lookupat.isra.37+0x6d/0x210
[  445.727932]  filename_lookup+0xaf/0x180
[  445.728708]  ? kmem_cache_alloc+0x1c0/0x240
[  445.729580]  vfs_statx+0x73/0xe0
[  445.730223]  ? _cond_resched+0x15/0x30
[  445.730997]  ? dentry_kill+0x4d/0x190
[  445.731733]  __do_sys_newstat+0x39/0x70
[  445.732512]  ? syscall_trace_enter+0x192/0x2b0
[  445.733426]  do_syscall_64+0x55/0x130
[  445.734165]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  445.735178] RIP: 0033:0x7f6b42d51aa5
[  445.735893] Code: Bad RIP value.
[  445.736542] RSP: 002b:00007ffe8b033d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
[  445.738079] RAX: ffffffffffffffda RBX: 00007ffe8b033d20 RCX: 00007f6b42d51aa5
[  445.739465] RDX: 00007ffe8b033d20 RSI: 00007ffe8b033d20 RDI: 00007f6b42dec54f
[  445.740884] RBP: 00007ffe8b033e80 R08: 000000000000000f R09: 0000556380d3d840
[  445.742274] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f6b42e27900
[  445.743675] R13: 0000000000000002 R14: 0000000000000065 R15: 0000000000000066
[  445.745082] INFO: task cron:1045 blocked for more than 61 seconds.
[  445.746193]       Tainted: P           OE     5.2.14 #1
[  445.746963] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.748105] cron            D    0  1045      1 0x00000000
[  445.748924] Call Trace:
[  445.749291]  ? __schedule+0x24a/0x600
[  445.749827]  schedule+0x2f/0xa0
[  445.750295]  cv_wait_common+0x156/0x290 [spl]
[  445.750941]  ? do_wait_intr_irq+0xb0/0xb0
[  445.751574]  spa_config_enter+0x13b/0x1e0 [zfs]
[  445.752286]  zio_vdev_io_start+0x529/0x5a0 [zfs]
[  445.753025]  ? tsd_get_by_thread+0x3b/0x80 [spl]
[  445.753867]  zio_nowait+0x148/0x2f0 [zfs]
[  445.754584]  arc_read+0xb44/0x1a50 [zfs]
[  445.755295]  ? dbuf_rele_and_unlock+0x750/0x750 [zfs]
[  445.756192]  dbuf_read_impl.constprop.16+0x2d4/0xc50 [zfs]
[  445.757178]  dbuf_read+0x107/0x600 [zfs]
[  445.757891]  dmu_buf_hold+0x56/0x80 [zfs]
[  445.758628]  zap_lockdir+0x54/0x120 [zfs]
[  445.759324]  ? stack_trace_save+0x4b/0x70
[  445.760065]  zap_lookup_norm+0x5d/0xd0 [zfs]
[  445.760867]  zap_lookup+0x12/0x20 [zfs]
[  445.761576]  zfs_dirent_lock+0x565/0x7d0 [zfs]
[  445.762392]  zfs_dirlook+0x8a/0x2c0 [zfs]
[  445.763136]  zfs_lookup+0x1ef/0x3b0 [zfs]
[  445.763878]  zpl_lookup+0xbd/0x280 [zfs]
[  445.764535]  __lookup_slow+0x97/0x150
[  445.765094]  lookup_slow+0x35/0x50
[  445.765596]  walk_component+0x1bf/0x2f0
[  445.766161]  path_lookupat.isra.37+0x6d/0x210
[  445.766794]  filename_lookup+0xaf/0x180
[  445.767361]  ? kmem_cache_alloc+0x1c0/0x240
[  445.767980]  vfs_statx+0x73/0xe0
[  445.768457]  __do_sys_newstat+0x39/0x70
[  445.769065]  ? __hrtimer_init+0xb0/0xb0
[  445.769840]  do_syscall_64+0x55/0x130
[  445.770423]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  445.771161] RIP: 0033:0x7fb243a4faa5
[  445.771682] Code: Bad RIP value.
[  445.772163] RSP: 002b:00007ffc3247f608 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
[  445.773283] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb243a4faa5
[  445.774477] RDX: 00007ffc3247f690 RSI: 00007ffc3247f690 RDI: 00007fb243ae88b1
[  445.775505] RBP: 0000000000000000 R08: 000000000000000f R09: 0000564cd0d1ddb0
[  445.776535] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fb243ae88b1
[  445.777597] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[  445.778772] INFO: task zpool:1812 blocked for more than 61 seconds.
[  445.779682]       Tainted: P           OE     5.2.14 #1
[  445.780438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.781591] zpool           D    0  1812   1453 0x00004000
[  445.782527] Call Trace:
[  445.782902]  ? __schedule+0x24a/0x600
[  445.783486]  ? vdev_open_children+0x160/0x160 [zfs]
[  445.784202]  schedule+0x2f/0xa0
[  445.784669]  taskq_wait+0x7a/0xc0 [spl]
[  445.785265]  ? do_wait_intr_irq+0xb0/0xb0
[  445.786052]  taskq_destroy+0x50/0x420 [spl]
[  445.786707]  ? vdev_open_children+0x160/0x160 [zfs]
[  445.787466]  vdev_open_children+0x10a/0x160 [zfs]
[  445.788201]  vdev_root_open+0x37/0x120 [zfs]
[  445.788918]  vdev_open+0xd9/0x900 [zfs]
[  445.789687]  vdev_reopen+0x4c/0x100 [zfs]
[  445.790318]  dsl_scan+0x36/0x120 [zfs]
[  445.790916]  zfs_ioc_pool_scan+0x5a/0xd0 [zfs]
[  445.791612]  zfsdev_ioctl_common+0x72e/0x9d0 [zfs]
[  445.792316]  ? __handle_mm_fault+0x10a2/0x1230
[  445.793000]  do_vfs_ioctl+0xa4/0x620
[  445.793716]  ksys_ioctl+0x60/0x90
[  445.794207]  __x64_sys_ioctl+0x16/0x20
[  445.794752]  do_syscall_64+0x55/0x130
[  445.795291]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  445.796023] RIP: 0033:0x7f851ccba427
[  445.796543] Code: Bad RIP value.
[  445.797050] RSP: 002b:00007ffccae900f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  445.798358] RAX: ffffffffffffffda RBX: 00007ffccae90530 RCX: 00007f851ccba427
[  445.799389] RDX: 00007ffccae90530 RSI: 0000000000005a07 RDI: 0000000000000003
[  445.800414] RBP: 00007ffccae93b10 R08: 0000000000000008 R09: 0000000000000001
[  445.801471] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[  445.802636] R13: 0000000000000000 R14: 000056476e2b87a0 R15: 000056476e2b6430
[  445.803686] INFO: task vdev_open:1813 blocked for more than 61 seconds.
[  445.804731]       Tainted: P           OE     5.2.14 #1
[  445.805580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.806848] vdev_open       D    0  1813      2 0x80004000
[  445.807752] Call Trace:
[  445.808157]  ? __schedule+0x24a/0x600
[  445.808803]  ? vdev_open_children+0x160/0x160 [zfs]
[  445.809604]  schedule+0x2f/0xa0
[  445.810122]  taskq_wait+0x7a/0xc0 [spl]
[  445.810733]  ? do_wait_intr_irq+0xb0/0xb0
[  445.811421]  taskq_destroy+0x50/0x420 [spl]
[  445.812153]  ? vdev_open_children+0x160/0x160 [zfs]
[  445.813024]  vdev_open_children+0x10a/0x160 [zfs]
[  445.813812]  vdev_mirror_open+0x2c/0x140 [zfs]
[  445.814591]  vdev_open+0xd9/0x900 [zfs]
[  445.815216]  ? __schedule+0x252/0x600
[  445.815855]  vdev_open_child+0x1e/0x40 [zfs]
[  445.816541]  taskq_thread+0x336/0x660 [spl]
[  445.817211]  ? __switch_to_asm+0x40/0x70
[  445.817821]  ? wake_up_q+0x70/0x70
[  445.818353]  ? taskq_thread_spawn+0x50/0x50 [spl]
[  445.819071]  kthread+0x116/0x130
[  445.819578]  ? __kthread_cancel_work+0x80/0x80
[  445.820265]  ret_from_fork+0x35/0x40
[  445.820865] INFO: task vdev_open:1815 blocked for more than 61 seconds.
[  445.821957]       Tainted: P           OE     5.2.14 #1
[  445.822774] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.824005] vdev_open       D    0  1815      2 0x80004000
[  445.824934] Call Trace:
[  445.825341]  ? __schedule+0x24a/0x600
[  445.825934]  ? __switch_to_asm+0x40/0x70
[  445.826564]  ? __switch_to_asm+0x34/0x70
[  445.827265]  schedule+0x2f/0xa0
[  445.827852]  schedule_timeout+0x20d/0x310
[  445.828582]  ? __schedule+0x252/0x600
[  445.829284]  ? try_to_wake_up+0x54/0x460
[  445.829968]  wait_for_completion+0x11f/0x190
[  445.830637]  ? wake_up_q+0x70/0x70
[  445.831179]  call_usermodehelper_exec+0x12e/0x160
[  445.831979]  vdev_elevator_switch+0x112/0x1a0 [zfs]
[  445.832811]  vdev_disk_open+0x2a0/0x4b0 [zfs]
[  445.833598]  vdev_open+0xd9/0x900 [zfs]
[  445.834220]  ? __schedule+0x252/0x600
[  445.834854]  vdev_open_child+0x1e/0x40 [zfs]
[  445.835536]  taskq_thread+0x336/0x660 [spl]
[  445.836230]  ? __switch_to_asm+0x40/0x70
[  445.836872]  ? wake_up_q+0x70/0x70
[  445.837418]  ? taskq_thread_spawn+0x50/0x50 [spl]
[  445.838171]  kthread+0x116/0x130
[  445.838696]  ? __kthread_cancel_work+0x80/0x80
[  445.839398]  ret_from_fork+0x35/0x40
[  445.840056] INFO: task kworker/u8:5:1816 blocked for more than 61 seconds.
[  445.841283]       Tainted: P           OE     5.2.14 #1
[  445.842190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  445.843438] kworker/u8:5    D    0  1816      2 0x80004000
[  445.844313] Call Trace:
[  445.844713]  ? __schedule+0x24a/0x600
[  445.845284]  schedule+0x2f/0xa0
[  445.845765]  cv_wait_common+0x156/0x290 [spl]
[  445.846447]  ? do_wait_intr_irq+0xb0/0xb0
[  445.847190]  spa_config_enter+0x13b/0x1e0 [zfs]
[  445.848067]  zio_vdev_io_start+0x529/0x5a0 [zfs]
[  445.848940]  ? tsd_get_by_thread+0x3b/0x80 [spl]
[  445.849725]  zio_nowait+0x148/0x2f0 [zfs]
[  445.850411]  arc_read+0xb44/0x1a50 [zfs]
[  445.851078]  ? dbuf_rele_and_unlock+0x750/0x750 [zfs]
[  445.851927]  dbuf_read_impl.constprop.16+0x2d4/0xc50 [zfs]
[  445.852887]  dbuf_read+0x107/0x600 [zfs]
[  445.853567]  dmu_buf_hold+0x56/0x80 [zfs]
[  445.854253]  zap_lockdir+0x54/0x120 [zfs]
[  445.854902]  ? stack_trace_save+0x4b/0x70
[  445.855574]  zap_lookup_norm+0x5d/0xd0 [zfs]
[  445.856304]  zap_lookup+0x12/0x20 [zfs]
[  445.856970]  zfs_dirent_lock+0x565/0x7d0 [zfs]
[  445.857716]  zfs_dirlook+0x8a/0x2c0 [zfs]
[  445.858401]  zfs_lookup+0x1ef/0x3b0 [zfs]
[  445.859073]  zpl_lookup+0xbd/0x280 [zfs]
[  445.859683]  path_openat+0x89f/0x1420
[  445.860277]  do_filp_open+0x93/0x100
[  445.860910]  ? zpl_fallocate+0x10/0x10 [zfs]
[  445.861582]  ? stack_trace_save+0x4b/0x70
[  445.862222]  ? _raw_write_lock_irqsave+0x2e/0x30
[  445.862975]  do_open_execat+0x83/0x1a0
[  445.863606]  __do_execve_file.isra.34+0x70a/0x9d0
[  445.864495]  ? create_object+0x21d/0x260
[  445.865241]  do_execve+0x21/0x30
[  445.865846]  call_usermodehelper_exec_async+0x189/0x1b0
[  445.866690]  ? recalc_sigpending+0x17/0x50
[  445.867387]  ? proc_cap_handler+0x1b0/0x1b0
[  445.868073]  ret_from_fork+0x35/0x40

Same reproducer running HEAD^ (94bcf6f) does not deadlock:

root@linux:~# cat /proc/sys/kernel/spl/gitrev
zfs-0.8.0-308-g94bcf6f5e
root@linux:~# sysctl -w kernel.hung_task_timeout_secs=60
kernel.hung_task_timeout_secs = 60
root@linux:~# zfs set primarycache=none rpool
root@linux:~# echo 3 > /proc/sys/vm/drop_caches 
[  116.529482] bash (1773): drop_caches: 3
root@linux:~# 
root@linux:~# 
root@linux:~# time zpool scrub rpool

real	0m1.487s
user	0m0.000s
sys	0m0.153s
root@linux:~#

I'm not sure why UMH_NO_WAIT is not working, we should probably investigate that too.

Incidentally this change seems to prevent #9417

tonynguien · 2019-10-08T03:45:12Z

Interesting. Thanks for reproducing the issue. I’ll use your test case. I agree we should also investigate why UMH_NO_WAIT isn’t working.

…

On Mon, Oct 7, 2019 at 14:51 loli10K ***@***.***> wrote: I managed to reproduce #8664 <#8664> on my root-on-zfs test VM with this change applied: ***@***.***:~# cat /proc/sys/kernel/spl/gitrev zfs-0.8.0-309-gff214b82a ***@***.***:~# sysctl -w kernel.hung_task_timeout_secs=60 kernel.hung_task_timeout_secs = 60 ***@***.***:~# zfs set primarycache=none rpool ***@***.***:~# echo 3 > /proc/sys/vm/drop_caches [ 323.265982] bash (1453): drop_caches: 3 ***@***.***:~# ***@***.***:~# ***@***.***:~# time zpool scrub rpool [ 445.680923] INFO: task txg_sync:534 blocked for more than 61 seconds. [ 445.683620] Tainted: P OE 5.2.14 #1 [ 445.685922] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.689250] txg_sync D 0 534 2 0x80004000 [ 445.691255] Call Trace: [ 445.691749] ? __schedule+0x24a/0x600 [ 445.692437] schedule+0x2f/0xa0 [ 445.693076] cv_wait_common+0x156/0x290 [spl] [ 445.693943] ? do_wait_intr_irq+0xb0/0xb0 [ 445.694802] spa_config_enter+0x13b/0x1e0 [zfs] [ 445.695729] spa_sync+0x1e7/0x1850 [zfs] [ 445.696556] ? zfs_refcount_remove_many+0x1e9/0x270 [zfs] [ 445.697604] ? _cond_resched+0x15/0x30 [ 445.698350] ? mutex_lock+0xe/0x30 [ 445.699074] ? spa_txg_history_set+0xde/0x130 [zfs] [ 445.700067] ? spa_txg_history_init_io+0xdc/0x110 [zfs] [ 445.701140] txg_sync_thread+0x2ce/0x550 [zfs] [ 445.702053] ? txg_thread_exit.isra.7+0xe0/0xe0 [zfs] [ 445.703014] ? __thread_exit+0x20/0x20 [spl] [ 445.703831] thread_generic_wrapper+0x9f/0xe0 [spl] [ 445.704803] kthread+0x116/0x130 [ 445.705446] ? __kthread_cancel_work+0x80/0x80 [ 445.706303] ret_from_fork+0x35/0x40 [ 445.707013] INFO: task systemd-timesyn:887 blocked for more than 61 seconds. [ 445.708382] Tainted: P OE 5.2.14 #1 [ 445.709450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.710977] systemd-timesyn D 0 887 1 0x00000120 [ 445.712074] Call Trace: [ 445.712576] ? __schedule+0x24a/0x600 [ 445.713318] ? mutex_lock+0xe/0x30 [ 445.714012] schedule+0x2f/0xa0 [ 445.714656] cv_wait_common+0x156/0x290 [spl] [ 445.715539] ? do_wait_intr_irq+0xb0/0xb0 [ 445.716388] dbuf_read+0x26f/0x600 [zfs] [ 445.717235] dmu_buf_hold+0x56/0x80 [zfs] [ 445.718113] zap_lockdir+0x54/0x120 [zfs] [ 445.718927] ? stack_trace_save+0x4b/0x70 [ 445.719762] zap_lookup_norm+0x5d/0xd0 [zfs] [ 445.720672] zap_lookup+0x12/0x20 [zfs] [ 445.721508] zfs_dirent_lock+0x565/0x7d0 [zfs] [ 445.722425] zfs_dirlook+0x8a/0x2c0 [zfs] [ 445.723263] zfs_lookup+0x1ef/0x3b0 [zfs] [ 445.724124] zpl_lookup+0xbd/0x280 [zfs] [ 445.724919] __lookup_slow+0x97/0x150 [ 445.725650] lookup_slow+0x35/0x50 [ 445.726307] walk_component+0x1bf/0x2f0 [ 445.727049] path_lookupat.isra.37+0x6d/0x210 [ 445.727932] filename_lookup+0xaf/0x180 [ 445.728708] ? kmem_cache_alloc+0x1c0/0x240 [ 445.729580] vfs_statx+0x73/0xe0 [ 445.730223] ? _cond_resched+0x15/0x30 [ 445.730997] ? dentry_kill+0x4d/0x190 [ 445.731733] __do_sys_newstat+0x39/0x70 [ 445.732512] ? syscall_trace_enter+0x192/0x2b0 [ 445.733426] do_syscall_64+0x55/0x130 [ 445.734165] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 445.735178] RIP: 0033:0x7f6b42d51aa5 [ 445.735893] Code: Bad RIP value. [ 445.736542] RSP: 002b:00007ffe8b033d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 [ 445.738079] RAX: ffffffffffffffda RBX: 00007ffe8b033d20 RCX: 00007f6b42d51aa5 [ 445.739465] RDX: 00007ffe8b033d20 RSI: 00007ffe8b033d20 RDI: 00007f6b42dec54f [ 445.740884] RBP: 00007ffe8b033e80 R08: 000000000000000f R09: 0000556380d3d840 [ 445.742274] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f6b42e27900 [ 445.743675] R13: 0000000000000002 R14: 0000000000000065 R15: 0000000000000066 [ 445.745082] INFO: task cron:1045 blocked for more than 61 seconds. [ 445.746193] Tainted: P OE 5.2.14 #1 [ 445.746963] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.748105] cron D 0 1045 1 0x00000000 [ 445.748924] Call Trace: [ 445.749291] ? __schedule+0x24a/0x600 [ 445.749827] schedule+0x2f/0xa0 [ 445.750295] cv_wait_common+0x156/0x290 [spl] [ 445.750941] ? do_wait_intr_irq+0xb0/0xb0 [ 445.751574] spa_config_enter+0x13b/0x1e0 [zfs] [ 445.752286] zio_vdev_io_start+0x529/0x5a0 [zfs] [ 445.753025] ? tsd_get_by_thread+0x3b/0x80 [spl] [ 445.753867] zio_nowait+0x148/0x2f0 [zfs] [ 445.754584] arc_read+0xb44/0x1a50 [zfs] [ 445.755295] ? dbuf_rele_and_unlock+0x750/0x750 [zfs] [ 445.756192] dbuf_read_impl.constprop.16+0x2d4/0xc50 [zfs] [ 445.757178] dbuf_read+0x107/0x600 [zfs] [ 445.757891] dmu_buf_hold+0x56/0x80 [zfs] [ 445.758628] zap_lockdir+0x54/0x120 [zfs] [ 445.759324] ? stack_trace_save+0x4b/0x70 [ 445.760065] zap_lookup_norm+0x5d/0xd0 [zfs] [ 445.760867] zap_lookup+0x12/0x20 [zfs] [ 445.761576] zfs_dirent_lock+0x565/0x7d0 [zfs] [ 445.762392] zfs_dirlook+0x8a/0x2c0 [zfs] [ 445.763136] zfs_lookup+0x1ef/0x3b0 [zfs] [ 445.763878] zpl_lookup+0xbd/0x280 [zfs] [ 445.764535] __lookup_slow+0x97/0x150 [ 445.765094] lookup_slow+0x35/0x50 [ 445.765596] walk_component+0x1bf/0x2f0 [ 445.766161] path_lookupat.isra.37+0x6d/0x210 [ 445.766794] filename_lookup+0xaf/0x180 [ 445.767361] ? kmem_cache_alloc+0x1c0/0x240 [ 445.767980] vfs_statx+0x73/0xe0 [ 445.768457] __do_sys_newstat+0x39/0x70 [ 445.769065] ? __hrtimer_init+0xb0/0xb0 [ 445.769840] do_syscall_64+0x55/0x130 [ 445.770423] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 445.771161] RIP: 0033:0x7fb243a4faa5 [ 445.771682] Code: Bad RIP value. [ 445.772163] RSP: 002b:00007ffc3247f608 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 [ 445.773283] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb243a4faa5 [ 445.774477] RDX: 00007ffc3247f690 RSI: 00007ffc3247f690 RDI: 00007fb243ae88b1 [ 445.775505] RBP: 0000000000000000 R08: 000000000000000f R09: 0000564cd0d1ddb0 [ 445.776535] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fb243ae88b1 [ 445.777597] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 445.778772] INFO: task zpool:1812 blocked for more than 61 seconds. [ 445.779682] Tainted: P OE 5.2.14 #1 [ 445.780438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.781591] zpool D 0 1812 1453 0x00004000 [ 445.782527] Call Trace: [ 445.782902] ? __schedule+0x24a/0x600 [ 445.783486] ? vdev_open_children+0x160/0x160 [zfs] [ 445.784202] schedule+0x2f/0xa0 [ 445.784669] taskq_wait+0x7a/0xc0 [spl] [ 445.785265] ? do_wait_intr_irq+0xb0/0xb0 [ 445.786052] taskq_destroy+0x50/0x420 [spl] [ 445.786707] ? vdev_open_children+0x160/0x160 [zfs] [ 445.787466] vdev_open_children+0x10a/0x160 [zfs] [ 445.788201] vdev_root_open+0x37/0x120 [zfs] [ 445.788918] vdev_open+0xd9/0x900 [zfs] [ 445.789687] vdev_reopen+0x4c/0x100 [zfs] [ 445.790318] dsl_scan+0x36/0x120 [zfs] [ 445.790916] zfs_ioc_pool_scan+0x5a/0xd0 [zfs] [ 445.791612] zfsdev_ioctl_common+0x72e/0x9d0 [zfs] [ 445.792316] ? __handle_mm_fault+0x10a2/0x1230 [ 445.793000] do_vfs_ioctl+0xa4/0x620 [ 445.793716] ksys_ioctl+0x60/0x90 [ 445.794207] __x64_sys_ioctl+0x16/0x20 [ 445.794752] do_syscall_64+0x55/0x130 [ 445.795291] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 445.796023] RIP: 0033:0x7f851ccba427 [ 445.796543] Code: Bad RIP value. [ 445.797050] RSP: 002b:00007ffccae900f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 445.798358] RAX: ffffffffffffffda RBX: 00007ffccae90530 RCX: 00007f851ccba427 [ 445.799389] RDX: 00007ffccae90530 RSI: 0000000000005a07 RDI: 0000000000000003 [ 445.800414] RBP: 00007ffccae93b10 R08: 0000000000000008 R09: 0000000000000001 [ 445.801471] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 445.802636] R13: 0000000000000000 R14: 000056476e2b87a0 R15: 000056476e2b6430 [ 445.803686] INFO: task vdev_open:1813 blocked for more than 61 seconds. [ 445.804731] Tainted: P OE 5.2.14 #1 [ 445.805580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.806848] vdev_open D 0 1813 2 0x80004000 [ 445.807752] Call Trace: [ 445.808157] ? __schedule+0x24a/0x600 [ 445.808803] ? vdev_open_children+0x160/0x160 [zfs] [ 445.809604] schedule+0x2f/0xa0 [ 445.810122] taskq_wait+0x7a/0xc0 [spl] [ 445.810733] ? do_wait_intr_irq+0xb0/0xb0 [ 445.811421] taskq_destroy+0x50/0x420 [spl] [ 445.812153] ? vdev_open_children+0x160/0x160 [zfs] [ 445.813024] vdev_open_children+0x10a/0x160 [zfs] [ 445.813812] vdev_mirror_open+0x2c/0x140 [zfs] [ 445.814591] vdev_open+0xd9/0x900 [zfs] [ 445.815216] ? __schedule+0x252/0x600 [ 445.815855] vdev_open_child+0x1e/0x40 [zfs] [ 445.816541] taskq_thread+0x336/0x660 [spl] [ 445.817211] ? __switch_to_asm+0x40/0x70 [ 445.817821] ? wake_up_q+0x70/0x70 [ 445.818353] ? taskq_thread_spawn+0x50/0x50 [spl] [ 445.819071] kthread+0x116/0x130 [ 445.819578] ? __kthread_cancel_work+0x80/0x80 [ 445.820265] ret_from_fork+0x35/0x40 [ 445.820865] INFO: task vdev_open:1815 blocked for more than 61 seconds. [ 445.821957] Tainted: P OE 5.2.14 #1 [ 445.822774] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.824005] vdev_open D 0 1815 2 0x80004000 [ 445.824934] Call Trace: [ 445.825341] ? __schedule+0x24a/0x600 [ 445.825934] ? __switch_to_asm+0x40/0x70 [ 445.826564] ? __switch_to_asm+0x34/0x70 [ 445.827265] schedule+0x2f/0xa0 [ 445.827852] schedule_timeout+0x20d/0x310 [ 445.828582] ? __schedule+0x252/0x600 [ 445.829284] ? try_to_wake_up+0x54/0x460 [ 445.829968] wait_for_completion+0x11f/0x190 [ 445.830637] ? wake_up_q+0x70/0x70 [ 445.831179] call_usermodehelper_exec+0x12e/0x160 [ 445.831979] vdev_elevator_switch+0x112/0x1a0 [zfs] [ 445.832811] vdev_disk_open+0x2a0/0x4b0 [zfs] [ 445.833598] vdev_open+0xd9/0x900 [zfs] [ 445.834220] ? __schedule+0x252/0x600 [ 445.834854] vdev_open_child+0x1e/0x40 [zfs] [ 445.835536] taskq_thread+0x336/0x660 [spl] [ 445.836230] ? __switch_to_asm+0x40/0x70 [ 445.836872] ? wake_up_q+0x70/0x70 [ 445.837418] ? taskq_thread_spawn+0x50/0x50 [spl] [ 445.838171] kthread+0x116/0x130 [ 445.838696] ? __kthread_cancel_work+0x80/0x80 [ 445.839398] ret_from_fork+0x35/0x40 [ 445.840056] INFO: task kworker/u8:5:1816 blocked for more than 61 seconds. [ 445.841283] Tainted: P OE 5.2.14 #1 [ 445.842190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 445.843438] kworker/u8:5 D 0 1816 2 0x80004000 [ 445.844313] Call Trace: [ 445.844713] ? __schedule+0x24a/0x600 [ 445.845284] schedule+0x2f/0xa0 [ 445.845765] cv_wait_common+0x156/0x290 [spl] [ 445.846447] ? do_wait_intr_irq+0xb0/0xb0 [ 445.847190] spa_config_enter+0x13b/0x1e0 [zfs] [ 445.848067] zio_vdev_io_start+0x529/0x5a0 [zfs] [ 445.848940] ? tsd_get_by_thread+0x3b/0x80 [spl] [ 445.849725] zio_nowait+0x148/0x2f0 [zfs] [ 445.850411] arc_read+0xb44/0x1a50 [zfs] [ 445.851078] ? dbuf_rele_and_unlock+0x750/0x750 [zfs] [ 445.851927] dbuf_read_impl.constprop.16+0x2d4/0xc50 [zfs] [ 445.852887] dbuf_read+0x107/0x600 [zfs] [ 445.853567] dmu_buf_hold+0x56/0x80 [zfs] [ 445.854253] zap_lockdir+0x54/0x120 [zfs] [ 445.854902] ? stack_trace_save+0x4b/0x70 [ 445.855574] zap_lookup_norm+0x5d/0xd0 [zfs] [ 445.856304] zap_lookup+0x12/0x20 [zfs] [ 445.856970] zfs_dirent_lock+0x565/0x7d0 [zfs] [ 445.857716] zfs_dirlook+0x8a/0x2c0 [zfs] [ 445.858401] zfs_lookup+0x1ef/0x3b0 [zfs] [ 445.859073] zpl_lookup+0xbd/0x280 [zfs] [ 445.859683] path_openat+0x89f/0x1420 [ 445.860277] do_filp_open+0x93/0x100 [ 445.860910] ? zpl_fallocate+0x10/0x10 [zfs] [ 445.861582] ? stack_trace_save+0x4b/0x70 [ 445.862222] ? _raw_write_lock_irqsave+0x2e/0x30 [ 445.862975] do_open_execat+0x83/0x1a0 [ 445.863606] __do_execve_file.isra.34+0x70a/0x9d0 [ 445.864495] ? create_object+0x21d/0x260 [ 445.865241] do_execve+0x21/0x30 [ 445.865846] call_usermodehelper_exec_async+0x189/0x1b0 [ 445.866690] ? recalc_sigpending+0x17/0x50 [ 445.867387] ? proc_cap_handler+0x1b0/0x1b0 [ 445.868073] ret_from_fork+0x35/0x40 Same reproducer running HEAD^ (94bcf6f <94bcf6f>) does not deadlock: ***@***.***:~# cat /proc/sys/kernel/spl/gitrev zfs-0.8.0-308-g94bcf6f5e ***@***.***:~# sysctl -w kernel.hung_task_timeout_secs=60 kernel.hung_task_timeout_secs = 60 ***@***.***:~# zfs set primarycache=none rpool ***@***.***:~# echo 3 > /proc/sys/vm/drop_caches [ 116.529482] bash (1773): drop_caches: 3 ***@***.***:~# ***@***.***:~# ***@***.***:~# time zpool scrub rpool real 0m1.487s user 0m0.000s sys 0m0.153s ***@***.***:~# I'm not sure why UMH_NO_WAIT is not working, we should probably investigate that too. Incidentally this change seems to prevent #9417 <#9417> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9422>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQVV7B3A5C2PQ7JWRKJ4ADQNOOM5ANCNFSM4I6GNZZQ> .

codecov · 2019-10-09T13:23:35Z

Codecov Report

Merging #9422 into master will increase coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #9422      +/-   ##
==========================================
+ Coverage   78.95%   79.03%   +0.08%     
==========================================
  Files         410      410              
  Lines      122499   122499              
==========================================
+ Hits        96723    96823     +100     
+ Misses      25776    25676     -100

Flag	Coverage Δ
#kernel	`79.75% <100%> (-0.01%)`	⬇️
#user	`66.5% <ø> (+0.28%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94bcf6f...ff214b8. Read the comment docs.

behlendorf · 2019-10-30T22:27:57Z

@tonynguien I don't believe we can move ahead with this PR since it will reintroduce the original deadlock. There are alternative ways to handle this, a few which come to mind are:

A new udev rule,
Update the zpool utility to set the scheduler, or
Perform the upcall from the system taskq

All of which are a bit klunky. Since automatically setting the IO scheduled isn't critical functionality, I'd suggest that instead we move forward with the original #9317 proposal and deprecate this functionality.

tonynguien · 2019-11-01T19:18:05Z

@tonynguien I don't believe we can move ahead with this PR since it will reintroduce the original deadlock. There are alternative ways to handle this, a few which come to mind are:

A new udev rule,

Update the zpool utility to set the scheduler, or

Perform the upcall from the system taskq

All of which are a bit klunky. Since automatically setting the IO scheduled isn't critical functionality, I'd suggest that instead we move forward with the original #9317 proposal and deprecate this functionality.

Thanks @behlendorf. I agree that we should move forward with #9317.

For systems where all devices are used for ZFS, setting elevator=noop GRUB option is one way to ensure disks have noop IO scheduler.

behlendorf requested a review from loli10K October 7, 2019 16:35

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 7, 2019

behlendorf approved these changes Oct 7, 2019

View reviewed changes

behlendorf mentioned this pull request Oct 30, 2019

[0.8.2] General protection fault in user access. Non-canonical address? #9417

Closed

behlendorf self-requested a review October 30, 2019 22:28

behlendorf closed this Nov 1, 2019

tonynguien deleted the elevator.UMH_WAIT_EXEC branch November 1, 2019 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use UMH_WAIT_EXEC with call_usermodehelper() to reliably set elevator #9422

Use UMH_WAIT_EXEC with call_usermodehelper() to reliably set elevator #9422

tonynguien commented Oct 7, 2019

behlendorf left a comment •

edited

Loading

loli10K commented Oct 7, 2019

tonynguien commented Oct 8, 2019 via email

codecov bot commented Oct 9, 2019 •

edited

Loading

behlendorf commented Oct 30, 2019

tonynguien commented Nov 1, 2019

Use UMH_WAIT_EXEC with call_usermodehelper() to reliably set elevator #9422

Use UMH_WAIT_EXEC with call_usermodehelper() to reliably set elevator #9422

Conversation

tonynguien commented Oct 7, 2019

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf left a comment • edited Loading

Choose a reason for hiding this comment

loli10K commented Oct 7, 2019

tonynguien commented Oct 8, 2019 via email

codecov bot commented Oct 9, 2019 • edited Loading

Codecov Report

behlendorf commented Oct 30, 2019

tonynguien commented Nov 1, 2019

behlendorf left a comment •

edited

Loading

codecov bot commented Oct 9, 2019 •

edited

Loading