-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
periodic txg_sync zfs tasks blocked (from ret_from_fork) #9796
Comments
I have exactly the same issue. Strack trace:
|
@jspuij Having the same error (and hang) does not mean it has the same cause... I cant find much of your stacktrace besides ret_from_fork to be the same as OP... @fermulator Do you have the same running 0.8.*? |
I have the same issue, originally caused #9747. For me, I'm using: This seems to occur when performing large writes to the pool.
|
@Ornias1993 See: #9800 Kernel version and zfs version in the issue. But it definitely happens on 0.8.2 too. |
@jspuij thanks for the 0.8.* confirmation @behlendorf we are slowly more and more getting flooded on those errors... |
@fixyourcodeplease222 Welcome to the ZFSonLinux repository.
|
@Ornias1993 no, I'm afraid I'm not quite ready to upgrade to |
Confirming the similar issue on Ubuntu 18.04 LTS with ZFS
|
steps to reproduce in my case: copy big file from host in rpool to vm within raid0 pool using jumbo frames. May 18 01:09:41 pc kernel: [93504.616531] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
May 18 06:14:06 pc kernel: [111769.934928] zvol D 0 351 2 0x80004000
May 18 06:14:06 pc kernel: [111769.934930] Call Trace:
May 18 06:14:06 pc kernel: [111769.934934] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.934936] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.934941] cv_wait_common+0x104/0x130 [spl]
May 18 06:14:06 pc kernel: [111769.934943] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.934946] __cv_wait+0x15/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.934973] dmu_buf_hold_array_by_dnode+0x1c4/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.934991] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.934994] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.935020] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.935024] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.935026] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935029] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935032] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.935034] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935035] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935041] l2arc_feed D 0 358 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935043] Call Trace:
May 18 06:14:06 pc kernel: [111769.935045] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935047] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935048] schedule_timeout+0x152/0x300
May 18 06:14:06 pc kernel: [111769.935050] ? __next_timer_interrupt+0xd0/0xd0
May 18 06:14:06 pc kernel: [111769.935052] io_schedule_timeout+0x1e/0x50
May 18 06:14:06 pc kernel: [111769.935054] __cv_timedwait_common+0x12f/0x170 [spl]
May 18 06:14:06 pc kernel: [111769.935056] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935059] __cv_timedwait_io+0x19/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935085] zio_wait+0x130/0x270 [zfs]
May 18 06:14:06 pc kernel: [111769.935110] ? zio_issue_async+0x12/0x20 [zfs]
May 18 06:14:06 pc kernel: [111769.935124] l2arc_feed_thread+0x9a8/0x1090 [zfs]
May 18 06:14:06 pc kernel: [111769.935126] ? __switch_to_asm+0x34/0x70
May 18 06:14:06 pc kernel: [111769.935127] ? __switch_to_asm+0x40/0x70
May 18 06:14:06 pc kernel: [111769.935129] ? syscall_return_via_sysret+0x19/0x7f
May 18 06:14:06 pc kernel: [111769.935143] ? l2arc_evict+0x2b0/0x2b0 [zfs]
May 18 06:14:06 pc kernel: [111769.935146] thread_generic_wrapper+0x74/0x90 [spl]
May 18 06:14:06 pc kernel: [111769.935148] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935151] ? __thread_exit+0x20/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935153] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935155] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935172] txg_sync D 0 1714 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935174] Call Trace:
May 18 06:14:06 pc kernel: [111769.935175] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935177] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935178] schedule_timeout+0x152/0x300
May 18 06:14:06 pc kernel: [111769.935180] ? __next_timer_interrupt+0xd0/0xd0
May 18 06:14:06 pc kernel: [111769.935182] io_schedule_timeout+0x1e/0x50
May 18 06:14:06 pc kernel: [111769.935184] __cv_timedwait_common+0x12f/0x170 [spl]
May 18 06:14:06 pc kernel: [111769.935186] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935189] __cv_timedwait_io+0x19/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935214] zio_wait+0x130/0x270 [zfs]
May 18 06:14:06 pc kernel: [111769.935229] dbuf_read+0x4e1/0xb80 [zfs]
May 18 06:14:06 pc kernel: [111769.935244] dbuf_hold_impl_arg+0x4ca/0x640 [zfs]
May 18 06:14:06 pc kernel: [111769.935259] dbuf_hold_impl+0x9a/0xc0 [zfs]
May 18 06:14:06 pc kernel: [111769.935273] dbuf_hold+0x33/0x60 [zfs]
May 18 06:14:06 pc kernel: [111769.935289] dmu_buf_hold_array_by_dnode+0xe1/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.935305] dmu_write.part.10+0x6a/0x100 [zfs]
May 18 06:14:06 pc kernel: [111769.935307] ? mutex_lock+0x12/0x30
May 18 06:14:06 pc kernel: [111769.935322] dmu_write+0x14/0x20 [zfs]
May 18 06:14:06 pc kernel: [111769.935347] space_map_write+0x152/0x600 [zfs]
May 18 06:14:06 pc kernel: [111769.935349] ? ktime_get_raw_ts64+0x38/0xd0
May 18 06:14:06 pc kernel: [111769.935351] ? mutex_lock+0x12/0x30
May 18 06:14:06 pc kernel: [111769.935373] metaslab_sync+0x473/0xc50 [zfs]
May 18 06:14:06 pc kernel: [111769.935375] ? _cond_resched+0x19/0x30
May 18 06:14:06 pc kernel: [111769.935377] ? mutex_lock+0x12/0x30
May 18 06:14:06 pc kernel: [111769.935401] vdev_sync+0x6f/0x1e0 [zfs]
May 18 06:14:06 pc kernel: [111769.935425] spa_sync+0x62f/0xfa0 [zfs]
May 18 06:14:06 pc kernel: [111769.935449] ? spa_txg_history_init_io+0x104/0x110 [zfs]
May 18 06:14:06 pc kernel: [111769.935474] txg_sync_thread+0x2d9/0x4c0 [zfs]
May 18 06:14:06 pc kernel: [111769.935498] ? txg_thread_exit.isra.12+0x60/0x60 [zfs]
May 18 06:14:06 pc kernel: [111769.935502] thread_generic_wrapper+0x74/0x90 [spl]
May 18 06:14:06 pc kernel: [111769.935504] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935507] ? __thread_exit+0x20/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935509] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935510] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935525] zvol D 0 21457 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935527] Call Trace:
May 18 06:14:06 pc kernel: [111769.935529] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935531] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935532] schedule_timeout+0x152/0x300
May 18 06:14:06 pc kernel: [111769.935533] ? __next_timer_interrupt+0xd0/0xd0
May 18 06:14:06 pc kernel: [111769.935535] io_schedule_timeout+0x1e/0x50
May 18 06:14:06 pc kernel: [111769.935538] __cv_timedwait_common+0x12f/0x170 [spl]
May 18 06:14:06 pc kernel: [111769.935540] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935542] __cv_timedwait_io+0x19/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935568] zio_wait+0x130/0x270 [zfs]
May 18 06:14:06 pc kernel: [111769.935584] dmu_buf_hold_array_by_dnode+0x15a/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.935601] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.935603] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.935628] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.935632] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.935634] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935636] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935639] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.935641] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935642] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935647] zvol D 0 21459 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935648] Call Trace:
May 18 06:14:06 pc kernel: [111769.935650] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935652] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935654] cv_wait_common+0x104/0x130 [spl]
May 18 06:14:06 pc kernel: [111769.935656] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935658] __cv_wait+0x15/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935674] dmu_buf_hold_array_by_dnode+0x1c4/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.935690] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.935692] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.935717] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.935721] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.935723] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935725] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935728] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.935730] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935731] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935736] zvol D 0 21460 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935737] Call Trace:
May 18 06:14:06 pc kernel: [111769.935739] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935741] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935743] cv_wait_common+0x104/0x130 [spl]
May 18 06:14:06 pc kernel: [111769.935745] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935747] __cv_wait+0x15/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935763] dmu_buf_hold_array_by_dnode+0x1c4/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.935779] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.935782] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.935806] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.935810] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.935812] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935814] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935817] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.935819] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935820] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935825] zvol D 0 21461 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935826] Call Trace:
May 18 06:14:06 pc kernel: [111769.935828] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935830] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935832] cv_wait_common+0x104/0x130 [spl]
May 18 06:14:06 pc kernel: [111769.935834] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935837] __cv_wait+0x15/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935852] dmu_buf_hold_array_by_dnode+0x1c4/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.935868] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.935870] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.935895] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.935899] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.935901] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935903] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.935906] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.935908] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.935909] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.935914] zvol D 0 21462 2 0x80004000
May 18 06:14:06 pc kernel: [111769.935915] Call Trace:
May 18 06:14:06 pc kernel: [111769.935917] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.935919] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.935920] schedule_timeout+0x152/0x300
May 18 06:14:06 pc kernel: [111769.935922] ? __next_timer_interrupt+0xd0/0xd0
May 18 06:14:06 pc kernel: [111769.935923] io_schedule_timeout+0x1e/0x50
May 18 06:14:06 pc kernel: [111769.935926] __cv_timedwait_common+0x12f/0x170 [spl]
May 18 06:14:06 pc kernel: [111769.935928] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.935930] __cv_timedwait_io+0x19/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.935955] zio_wait+0x130/0x270 [zfs]
May 18 06:14:06 pc kernel: [111769.935972] dmu_buf_hold_array_by_dnode+0x15a/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.935988] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.935990] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.936015] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.936019] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.936021] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.936023] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.936026] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.936028] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.936029] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.936034] zvol D 0 21471 2 0x80004000
May 18 06:14:06 pc kernel: [111769.936035] Call Trace:
May 18 06:14:06 pc kernel: [111769.936037] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.936039] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.936041] cv_wait_common+0x104/0x130 [spl]
May 18 06:14:06 pc kernel: [111769.936043] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.936045] __cv_wait+0x15/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.936061] dmu_buf_hold_array_by_dnode+0x1c4/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.936077] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.936080] ? spl_kmem_alloc+0xec/0x140 [spl]
May 18 06:14:06 pc kernel: [111769.936082] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.936107] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.936111] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.936112] ? __switch_to_asm+0x40/0x70
May 18 06:14:06 pc kernel: [111769.936114] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.936116] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.936119] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.936121] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.936122] ret_from_fork+0x35/0x40
May 18 06:14:06 pc kernel: [111769.936127] zvol D 0 21472 2 0x80004000
May 18 06:14:06 pc kernel: [111769.936128] Call Trace:
May 18 06:14:06 pc kernel: [111769.936130] __schedule+0x2e6/0x6f0
May 18 06:14:06 pc kernel: [111769.936132] schedule+0x33/0xa0
May 18 06:14:06 pc kernel: [111769.936134] cv_wait_common+0x104/0x130 [spl]
May 18 06:14:06 pc kernel: [111769.936136] ? wait_woken+0x80/0x80
May 18 06:14:06 pc kernel: [111769.936138] __cv_wait+0x15/0x20 [spl]
May 18 06:14:06 pc kernel: [111769.936154] dmu_buf_hold_array_by_dnode+0x1c4/0x480 [zfs]
May 18 06:14:06 pc kernel: [111769.936170] dmu_read_uio_dnode+0x49/0xf0 [zfs]
May 18 06:14:06 pc kernel: [111769.936173] ? spl_kmem_alloc+0xec/0x140 [spl]
May 18 06:14:06 pc kernel: [111769.936175] ? generic_start_io_acct+0x101/0x120
May 18 06:14:06 pc kernel: [111769.936200] zvol_read+0x101/0x2d0 [zfs]
May 18 06:14:06 pc kernel: [111769.936204] taskq_thread+0x2ec/0x4d0 [spl]
May 18 06:14:06 pc kernel: [111769.936205] ? __switch_to_asm+0x40/0x70
May 18 06:14:06 pc kernel: [111769.936207] ? wake_up_q+0x80/0x80
May 18 06:14:06 pc kernel: [111769.936209] kthread+0x120/0x140
May 18 06:14:06 pc kernel: [111769.936212] ? task_done+0xb0/0xb0 [spl]
May 18 06:14:06 pc kernel: [111769.936214] ? kthread_park+0x90/0x90
May 18 06:14:06 pc kernel: [111769.936215] ret_from_fork+0x35/0x40 |
I'll repost here, not only in my own issue, but my issue was caused by my drives being SMR (Shingled Magnetic Recording) while I was unaware as the manufacturer had not specified it in the disk specs. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information (SYSTEM A, VM host)
This system is purely a KVM host, it runs nill applications itself, it has 2x mirrors one for root ZFS and another for storage.
System information (SYSTEM B, storage server)
Describe the problem you're observing
Occasionally have seen "zfs task:##### blocked for more than 120 seconds" messages in system logs on two different zfsonlinux systems.
It happens infrequently, but in bursts. I cannot determine any "pattern" of events prior to these blocking zfs tasks.
Describe how to reproduce the problem
UNKNOWN
Given it has happened on TWO systems now, same software version(s), but different hardware, I am wondering if it is a common configuration or software component.
Include any warning/errors/backtraces from the system logs
System A
This repeats itself many times, then stopped.
System B
On another system I have running
Appendix
on system A, the pools and datasets and ZVOLs:
on system B, the pools are a pool of mirrors, nothing fancy really
on system B, we aren't using root boot on zfs (it is MDADM)
it is worth noting I saw #7659 and #7734
On system A (the VM one) when the ZVOLs were on the same pool as the root boot, these locks were FREQUENT, and in fact I may have been hit by #7734 directly (but since I isolated the VM ZVOLs on a different hardware pool, ... haven't had the issue as prominantly)
I don't think #7659 applies as I'm not using LXD.
Only a few datasets have dedup on fwiw (systemB); systemA doesn't have dedup on at all.
The text was updated successfully, but these errors were encountered: