-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send/recv failed - spl_system_task stuck #2453
Comments
Note that this is with 0.6.3 on gentoo with kernel 3.12.21-gentoo-r1 |
Try to send without dedup ( -D), i have seen this issue too. Look at #2210. |
@pyavdr Thanks, just tried again without -D and it did work. I'll run without -D for a while and see if I get any more failures. I recall only having one similar failure on 0.6.2 but had several this morning on 0.6.3. |
Pull request #2338 might fix this. I plan to include it in the next Gentoo patch set. |
Richard, i tried your hint (#2338) but it didnt fix this issue (#2210, #2453) . Latest HEAD on kernel 3.11. with #2338 added. STRACE added linux-ts3r:/tank # strace zfs recv -v tank/zfstest < /tank/zfstest2/testD.zfs |
@dscherger Would you reproduce this with the kernel changes that setting USE=debug on the zfs-kmod ebuild requires? It should cleanup the stacks. |
From the posted strace output it's clear that the kernel is returning EINVAL in Also since this appears to be a regression somewhere between 0.6.2 and 0.6.3 it would be useful to do a git bisect to see what patch introduced this. |
@ryao here's another set of stacks with debug enabled on zfs-kmod [ 481.807196] INFO: task spl_system_task:1358 blocked for more than 120 seconds. I'm able to reproduce this with 100% reliability using: |
Another stacktrace for a stuck system task in the context of a "zfs send" on 0.6.3
|
There has been numerous fixes made in this area in the master source. Has anyone verified that can still be reproduced with the latest code? |
Here's a stack from another failed attempt yesterday. Mar 22 21:01:00 haswell kernel: INFO: task spl_system_task:1384 blocked for more than 120 seconds. I've since enabled debug on zfs, zfs-kmod and spl ebuilds but I can't seem to get another stack trace. send/recv now just hang without reporting anything. I do see the following stack trace on boot up: Mar 23 08:27:32 haswell kernel: SPL: using hostid 0x00000000 |
I think that #5689 closes this issue, reopen it again if it's not. |
running this:
zfs send -RD -I alpha@zfs-auto-snap_daily-2014-06-16-0910 alpha@zfs-auto-snap_daily-2014-07-01-0910 | mbuffer -m 100m -s 64k | zfs recv -Fud backup
got this error:
cannot receive incremental stream: invalid backup stream
dmesg shows this:
[ 1322.490202] INFO: task spl_system_task:1363 blocked for more than 120 seconds.
[ 1322.490204] Tainted: P O 3.12.21-gentoo-r1 #1
[ 1322.490205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1322.490205] spl_system_task D ffffffff81495600 0 1363 2 0x00000000
[ 1322.490208] ffff88081a5d8b40 0000000000000046 ffff88083b163540 0000000000004000
[ 1322.490209] ffff88081adabfd8 ffff88081adabfd8 0000000000000001 ffffffffa03d797d
[ 1322.490211] ffff880817930000 ffffc900b1423130 0000000000000000 ffffffffa03efe60
[ 1322.490213] Call Trace:
[ 1322.490224] [] ? zil_vdev_offline+0x30d/0x5b0 [zfs]
[ 1322.490229] [] ? zio_buf_free+0x433/0xd80 [zfs]
[ 1322.490233] [] ? zio_nowait+0xa4/0xf20 [zfs]
[ 1322.490237] [] ? arc_read+0x311/0x910 [zfs]
[ 1322.490240] [] ? _raw_spin_lock_irqsave+0x1e/0x50
[ 1322.490241] [] ? _raw_spin_unlock_irqrestore+0x13/0x40
[ 1322.490244] [] ? __cv_timedwait+0x115/0x130 [spl]
[ 1322.490247] [] ? abort_exclusive_wait+0xb0/0xb0
[ 1322.490253] [] ? dmu_objset_is_receiving+0x23b/0xee0 [zfs]
[ 1322.490258] [] ? dmu_objset_is_receiving+0x716/0xee0 [zfs]
[ 1322.490262] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490267] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490271] [] ? traverse_pool+0x24f/0x400 [zfs]
[ 1322.490275] [] ? dmu_objset_is_receiving+0x9b0/0xee0 [zfs]
[ 1322.490279] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490283] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490287] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490291] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490296] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490300] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1322.490304] [] ? traverse_pool+0x24f/0x400 [zfs]
[ 1322.490308] [] ? dmu_objset_is_receiving+0xa76/0xee0 [zfs]
[ 1322.490312] [] ? traverse_pool+0x3a9/0x400 [zfs]
[ 1322.490315] [] ? dmu_objset_is_receiving+0x170/0xee0 [zfs]
[ 1322.490318] [] ? taskq_dispatch_delay+0x481/0x620 [spl]
[ 1322.490320] [] ? try_to_wake_up+0x2a0/0x2a0
[ 1322.490321] [] ? taskq_dispatch_delay+0x2e0/0x620 [spl]
[ 1322.490323] [] ? kthread+0xb3/0xc0
[ 1322.490324] [] ? up_read+0x10/0x20
[ 1322.490326] [] ? kthread_freezable_should_stop+0x60/0x60
[ 1322.490328] [] ? ret_from_fork+0x7c/0xb0
[ 1322.490329] [] ? kthread_freezable_should_stop+0x60/0x60
[ 1442.596686] INFO: task spl_system_task:1363 blocked for more than 120 seconds.
[ 1442.596688] Tainted: P O 3.12.21-gentoo-r1 #1
[ 1442.596689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1442.596689] spl_system_task D ffffffff81495600 0 1363 2 0x00000000
[ 1442.596692] ffff88081a5d8b40 0000000000000046 ffff88083b163540 0000000000004000
[ 1442.596693] ffff88081adabfd8 ffff88081adabfd8 0000000000000001 ffffffffa03d797d
[ 1442.596695] ffff880817930000 ffffc900b1423130 0000000000000000 ffffffffa03efe60
[ 1442.596696] Call Trace:
[ 1442.596709] [] ? zil_vdev_offline+0x30d/0x5b0 [zfs]
[ 1442.596713] [] ? zio_buf_free+0x433/0xd80 [zfs]
[ 1442.596718] [] ? zio_nowait+0xa4/0xf20 [zfs]
[ 1442.596721] [] ? arc_read+0x311/0x910 [zfs]
[ 1442.596724] [] ? _raw_spin_lock_irqsave+0x1e/0x50
[ 1442.596726] [] ? _raw_spin_unlock_irqrestore+0x13/0x40
[ 1442.596729] [] ? __cv_timedwait+0x115/0x130 [spl]
[ 1442.596732] [] ? abort_exclusive_wait+0xb0/0xb0
[ 1442.596737] [] ? dmu_objset_is_receiving+0x23b/0xee0 [zfs]
[ 1442.596743] [] ? dmu_objset_is_receiving+0x716/0xee0 [zfs]
[ 1442.596747] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596751] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596756] [] ? traverse_pool+0x24f/0x400 [zfs]
[ 1442.596760] [] ? dmu_objset_is_receiving+0x9b0/0xee0 [zfs]
[ 1442.596764] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596768] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596772] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596776] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596780] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596784] [] ? dmu_objset_is_receiving+0x88e/0xee0 [zfs]
[ 1442.596788] [] ? traverse_pool+0x24f/0x400 [zfs]
[ 1442.596792] [] ? dmu_objset_is_receiving+0xa76/0xee0 [zfs]
[ 1442.596796] [] ? traverse_pool+0x3a9/0x400 [zfs]
[ 1442.596799] [] ? dmu_objset_is_receiving+0x170/0xee0 [zfs]
[ 1442.596802] [] ? taskq_dispatch_delay+0x481/0x620 [spl]
[ 1442.596804] [] ? try_to_wake_up+0x2a0/0x2a0
[ 1442.596806] [] ? taskq_dispatch_delay+0x2e0/0x620 [spl]
[ 1442.596807] [] ? kthread+0xb3/0xc0
[ 1442.596808] [] ? up_read+0x10/0x20
[ 1442.596810] [] ? kthread_freezable_should_stop+0x60/0x60
[ 1442.596812] [] ? ret_from_fork+0x7c/0xb0
[ 1442.596813] [] ? kthread_freezable_should_stop+0x60/0x60
The text was updated successfully, but these errors were encountered: