Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel error on 3.16.7 during zfs send #2873

Closed
krichter722 opened this issue Nov 7, 2014 · 2 comments
Closed

kernel error on 3.16.7 during zfs send #2873

krichter722 opened this issue Nov 7, 2014 · 2 comments

Comments

@krichter722
Copy link

During zfs send (zfs send -R rpool2@now12 | pv | zfs receive -F rpool1) I'm getting the following error on Ubuntu 14.10 with Linux 3.16.7

Nov  7 01:47:33 localhost kernel: [ 1081.074744] INFO: task spl_system_task:283 blocked for more than 120 seconds.
Nov  7 01:47:33 localhost kernel: [ 1081.074749]       Tainted: P        W  OE 3.16.7-031607-generic #201410301735
Nov  7 01:47:33 localhost kernel: [ 1081.074750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  7 01:47:33 localhost kernel: [ 1081.074751] spl_system_task D 0000000000000003     0   283      2 0x00000000
Nov  7 01:47:33 localhost kernel: [ 1081.074755]  ffff880420c1b630 0000000000000046 ffff8804235ca880 ffff880420c1bfd8
Nov  7 01:47:33 localhost kernel: [ 1081.074757]  00000000000143c0 00000000000143c0 ffff880428eaa880 ffff8804235ca880
Nov  7 01:47:33 localhost kernel: [ 1081.074759]  ffff880420c1b640 ffff8803049e7da8 ffff8803049e7d80 ffff8803049e7db0
Nov  7 01:47:33 localhost kernel: [ 1081.074761] Call Trace:
Nov  7 01:47:33 localhost kernel: [ 1081.074766]  [<ffffffff82792919>] schedule+0x29/0x70
Nov  7 01:47:33 localhost kernel: [ 1081.074787]  [<ffffffffc030cb95>] cv_wait_common+0x105/0x1a0 [spl]
Nov  7 01:47:33 localhost kernel: [ 1081.074791]  [<ffffffff820bab10>] ? __wake_up_sync+0x20/0x20
Nov  7 01:47:33 localhost kernel: [ 1081.074799]  [<ffffffffc030cc45>] __cv_wait+0x15/0x20 [spl]
Nov  7 01:47:33 localhost kernel: [ 1081.074826]  [<ffffffffc03bf89b>] traverse_prefetcher+0x9b/0x150 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074846]  [<ffffffffc03bfb55>] traverse_visitbp+0xf5/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074858]  [<ffffffffc03a70c0>] ? arc_buf_remove_ref+0x110/0x110 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074873]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074889]  [<ffffffffc03c0875>] traverse_dnode+0x75/0x110 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074910]  [<ffffffffc03c0003>] traverse_visitbp+0x5a3/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074926]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074941]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074955]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074968]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.074984]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.075003]  [<ffffffffc03bfea8>] traverse_visitbp+0x448/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.075022]  [<ffffffffc03c0875>] traverse_dnode+0x75/0x110 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.075056]  [<ffffffffc03c00cd>] traverse_visitbp+0x66d/0x770 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.075059]  [<ffffffff820135c6>] ? __switch_to+0xf6/0x5c0
Nov  7 01:47:33 localhost kernel: [ 1081.075075]  [<ffffffffc03c07c3>] traverse_prefetch_thread+0x83/0xc0 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.075093]  [<ffffffffc03bf800>] ? prefetch_dnode_metadata+0xc0/0xc0 [zfs]
Nov  7 01:47:33 localhost kernel: [ 1081.075101]  [<ffffffffc03070e7>] taskq_thread+0x247/0x4c0 [spl]
Nov  7 01:47:33 localhost kernel: [ 1081.075105]  [<ffffffff820a7d20>] ? try_to_wake_up+0x290/0x290
Nov  7 01:47:33 localhost kernel: [ 1081.075111]  [<ffffffffc0306ea0>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Nov  7 01:47:33 localhost kernel: [ 1081.075114]  [<ffffffff82096479>] kthread+0xc9/0xe0
Nov  7 01:47:33 localhost kernel: [ 1081.075116]  [<ffffffff820963b0>] ? flush_kthread_worker+0xb0/0xb0
Nov  7 01:47:33 localhost kernel: [ 1081.075118]  [<ffffffff827966fc>] ret_from_fork+0x7c/0xb0
Nov  7 01:47:33 localhost kernel: [ 1081.075120]  [<ffffffff820963b0>] ? flush_kthread_worker+0xb0/0xb0

I send SIGSTOP to the process pipe (with Strg+z). The issue occured ~30 seconds after I did that. Then zfs send and/or zfs receive don't proceed as shown by pv.

# zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data     390G   177G   213G    45%  1.15x  ONLINE  -
rpool1   195G   931M   194G     0%  1.08x  ONLINE  -
rpool2   146G   126G  19,5G    86%  1.31x  ONLINE  -
syno        -      -      -      -      -  FAULTED  -
wd1     4,97T   918G  4,07T    18%  1.98x  ONLINE  -
wd2     4,97T  1,19T  3,78T    23%  1.15x  ONLINE  -
xy          -      -      -      -      -  FAULTED  -

Both rpool1 and rpool2 are pools on partitions on the same HDD (not an ideal setup, I know). rpool2 is the pool the system is running on. dedup is on and compression is lz4.

@kernelOfTruth
Copy link
Contributor

any news on this ?

either way - please try either with latest master or

#3132
openzfs/spl#435

if that helps, it would have been related to memory load

additionally it could be related to #1948

adding #675 for good measure

@gmelikov
Copy link
Member

gmelikov commented Feb 4, 2017

Close as stale, feel free to reopen if it's still actual.

@gmelikov gmelikov closed this as completed Feb 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants