Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: allocating allocated segment #6315

Closed
bunder2015 opened this issue Jul 5, 2017 · 13 comments
Closed

panic: allocating allocated segment #6315

bunder2015 opened this issue Jul 5, 2017 · 13 comments
Assignees
Labels
Component: ZVOL ZFS Volumes

Comments

@bunder2015
Copy link
Contributor

bunder2015 commented Jul 5, 2017

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version n/a
Linux Kernel 4.9.16
Architecture amd64
ZFS Version 0.7.0-rc4_49_g82644107
SPL Version 0.7.0-rc4_4_gac48361

qemu-2.9.0-r2
virt-manager-1.4.0-r3
virtualbox-5.0.32

Describe the problem you're observing

VM setup on my laptop, gentoo host - OpenIndiana guest residing on a zvol. Tried running OI through vbox (KVM accelerated), as soon as the guest tries to do any zfs operations, zfs on the host panics. Also tried using KVM/QEMU directly (with virt-manager), same result.

Scrubbing after the panic reveals that the zvol is now damaged and needs to be destroyed.

errors: Permanent errors have been detected in the following files:
        a-pool/vm-openindiana:<0x1> 

FreeBSD installed through same methods was successful.

Describe how to reproduce the problem

Create new zvol, assign it to guest, attempt to install OS (or boot pre-existing install), while watching host dmesg.

edit: zvol was created with:
zfs create -b 4K -V 30G -o compression=off -o primarycache=none -o secondarycache=none -o logbias=throughput -o sync=always a-pool/vm-openindiana

Looks slightly similar to #5504

Include any warning/errors/backtraces from the system logs

[  902.870458] PANIC: zfs: allocating allocated segment(offset=126977171456 size=4096)
 
[  902.870465] Showing stack for process 2406
[  902.870471] CPU: 3 PID: 2406 Comm: txg_sync Tainted: P           O    4.9.16-gentoo #11
[  902.870473] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A16 12/05/2013
[  902.870477]  ffffc90003747a18 ffffffff81277ef5 0000000000000003 0000000000001000
[  902.870484]  ffffc90003747a28 ffffffffa0005846 ffffc90003747b48 ffffffffa00059ad
[  902.870490]  0000000000015240 6c6c61203a73667a 20676e697461636f 657461636f6c6c61
[  902.870497] Call Trace:
[  902.870509]  [<ffffffff81277ef5>] dump_stack+0x4d/0x63
[  902.870522]  [<ffffffffa0005846>] spl_dumpstack+0x4e/0x50 [spl]
[  902.870531]  [<ffffffffa00059ad>] vcmn_err+0x72/0xd7 [spl]
[  902.870540]  [<ffffffff8108fa11>] ? pick_next_task_fair+0x108/0x6ed
[  902.870546]  [<ffffffff8108fa11>] ? pick_next_task_fair+0x108/0x6ed
[  902.870554]  [<ffffffffa00024d6>] ? spl_kmem_cache_alloc+0x7e/0x621 [spl]
[  902.870612]  [<ffffffffa011bb3d>] zfs_panic_recover+0x61/0x69 [zfs]
[  902.870618]  [<ffffffffa006d139>] ? avl_find+0x4c/0x87 [zavl]
[  902.870670]  [<ffffffffa0107dd5>] range_tree_add+0x95/0x266 [zfs]
[  902.870716]  [<ffffffffa0161d08>] ? zio_add_child+0x10b/0x3d8 [zfs]
[  902.870767]  [<ffffffffa010473a>] metaslab_block_maxsize+0x3c0/0x3e4 [zfs]
[  902.870819]  [<ffffffffa01069c4>] metaslab_free+0x9d/0xbe [zfs]
[  902.870865]  [<ffffffffa015fe53>] zio_interrupt+0xf2/0xccc [zfs]
[  902.870910]  [<ffffffffa01632a9>] zio_nowait+0xf9/0x10d [zfs]
[  902.870963]  [<ffffffffa010ebb7>] ? spa_async_request+0x626/0x727 [zfs]
[  902.871015]  [<ffffffffa010ec08>] spa_async_request+0x677/0x727 [zfs]
[  902.871048]  [<ffffffffa00cc976>] bplist_iterate+0x7b/0x1ce [zfs]
[  902.871102]  [<ffffffffa0110dac>] spa_sync+0x4ab/0xae1 [zfs]
[  902.871156]  [<ffffffffa011f156>] txg_delay+0x39f/0x7c8 [zfs]
[  902.871210]  [<ffffffffa011eedb>] ? txg_delay+0x124/0x7c8 [zfs]
[  902.871219]  [<ffffffffa0002e9c>] __thread_exit+0xa1/0xab [spl]
[  902.871227]  [<ffffffffa0002e20>] ? __thread_exit+0x25/0xab [spl]
[  902.871232]  [<ffffffff8108060f>] kthread+0xce/0xd6
[  902.871237]  [<ffffffff81080541>] ? kthread_worker_fn+0xfe/0xfe
[  902.871241]  [<ffffffff81080541>] ? kthread_worker_fn+0xfe/0xfe
[  902.871246]  [<ffffffff8156f552>] ret_from_fork+0x22/0x30
@bunder2015
Copy link
Contributor Author

bunder2015 commented Jul 28, 2017

Been chatting with a few people in IRC, unfortunately we were able to reproduce this with OI as a KVM guest on a linux ZOL host.

[31546.368710] PANIC: zfs: allocating allocated segment(offset=340760625152 size=4096)

[31546.368715] Showing stack for process 1056
[31546.368717] CPU: 2 PID: 1056 Comm: txg_sync Tainted: P           O    4.9.34-gentoo #1
[31546.368718] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2901 05/04/2016
[31546.368719]  ffffc90007caf980 ffffffff813013d0 0000000000000003 ffff88042431bc00
[31546.368721]  ffffc90007caf990 ffffffffa04b5952 ffffc90007cafab8 ffffffffa04b5aca
[31546.368723]  ffff8804249de200 6c6c61203a73667a 20676e697461636f 657461636f6c6c61
[31546.368725] Call Trace:
[31546.368731]  [<ffffffff813013d0>] dump_stack+0x63/0x83
[31546.368736]  [<ffffffffa04b5952>] spl_dumpstack+0x42/0x50 [spl]
[31546.368738]  [<ffffffffa04b5aca>] vcmn_err+0x6a/0x100 [spl]
[31546.368741]  [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630
[31546.368743]  [<ffffffffa04b19d2>] ? spl_kmem_cache_alloc+0x72/0x8e0 [spl]
[31546.368745]  [<ffffffffa04b19d2>] ? spl_kmem_cache_alloc+0x72/0x8e0 [spl]
[31546.368746]  [<ffffffff811e6edb>] ? kmem_cache_alloc+0xdb/0x1b0
[31546.368748]  [<ffffffffa04b19d2>] ? spl_kmem_cache_alloc+0x72/0x8e0 [spl]
[31546.368783]  [<ffffffffa06c2b2f>] zfs_panic_recover+0x6f/0x90 [zfs]
[31546.368799]  [<ffffffffa06aa34e>] range_tree_add+0x18e/0x2d0 [zfs]
[31546.368816]  [<ffffffffa06a4188>] metaslab_free_dva+0x148/0x400 [zfs]
[31546.368837]  [<ffffffffa06a7cea>] metaslab_free+0xaa/0xf0 [zfs]
[31546.368856]  [<ffffffffa06b2b80>] ? spa_avz_build+0x130/0x130 [zfs]
[31546.368872]  [<ffffffffa0717d8c>] zio_dva_free+0x1c/0x30 [zfs]
[31546.368887]  [<ffffffffa071c2b6>] zio_nowait+0xb6/0x160 [zfs]
[31546.368906]  [<ffffffffa06b2bc0>] spa_free_sync_cb+0x40/0x50 [zfs]
[31546.368919]  [<ffffffffa066009e>] bplist_iterate+0xbe/0x110 [zfs]
[31546.368938]  [<ffffffffa06b52c8>] spa_sync+0x448/0xd70 [zfs]
[31546.368941]  [<ffffffff810c4664>] ? __wake_up+0x44/0x50
[31546.368959]  [<ffffffffa06c7304>] txg_sync_thread+0x2c4/0x480 [zfs]
[31546.368961]  [<ffffffff810a72c8>] ? finish_task_switch+0x78/0x1f0
[31546.368979]  [<ffffffffa06c7040>] ? txg_delay+0x160/0x160 [zfs]
[31546.368981]  [<ffffffffa04b25f0>] ? __thread_exit+0x20/0x20 [spl]
[31546.368983]  [<ffffffffa04b2662>] thread_generic_wrapper+0x72/0x80 [spl]
[31546.368985]  [<ffffffff8109f266>] kthread+0xe6/0x100
[31546.368986]  [<ffffffff8109f180>] ? kthread_park+0x60/0x60
[31546.368989] [<ffffffff8160c955>] ret_from_fork+0x25/0x30

@ryao has shown some interest in helping me debug this, will try to update on Monday/Tuesday.

@loli10K loli10K added the Component: ZVOL ZFS Volumes label Jul 28, 2017
@mailinglists35
Copy link

mailinglists35 commented Jul 29, 2017

ubuntu 16.04 with kernel 4.10, qemu-kvm 1:2.5+dfsg-5ubuntu10.14 and libvirtd-bin 1.3.1-1ubuntu10
spl 0.7.0-rc4_5_g7a35f2b
zfs 0.7.0-rc4_86_gfe46eeb

[Jul29 04:02] PANIC: zfs: allocating allocated segment(offset=7149943734272 size=4096)

[  +0.000102] Showing stack for process 11730
[  +0.000003] CPU: 0 PID: 11730 Comm: txg_sync Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000001] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015
[  +0.000000] Call Trace:
[  +0.000007]  dump_stack+0x63/0x90
[  +0.000009]  spl_dumpstack+0x42/0x50 [spl]
[  +0.000006]  vcmn_err+0x6a/0x100 [spl]
[  +0.000003]  ? pick_next_task_fair+0x3d6/0x4d0
[  +0.000003]  ? __schedule+0x23a/0x700
[  +0.000003]  ? update_load_avg+0x6b/0x550
[  +0.000004]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000004]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000002]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000004]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000051]  zfs_panic_recover+0x6c/0x90 [zfs]
[  +0.000040]  range_tree_add+0x2b0/0x2c0 [zfs]
[  +0.000003]  ? mutex_lock+0x12/0x40
[  +0.000042]  ? zio_add_child+0x15a/0x180 [zfs]
[  +0.000039]  metaslab_free_dva+0x145/0x400 [zfs]
[  +0.000038]  metaslab_free+0x92/0xd0 [zfs]
[  +0.000040]  ? spa_avz_build+0x130/0x130 [zfs]
[  +0.000041]  zio_dva_free+0x1c/0x30 [zfs]
[  +0.000040]  zio_nowait+0xb6/0x150 [zfs]
[  +0.000040]  spa_free_sync_cb+0x40/0x50 [zfs]
[  +0.000028]  bplist_iterate+0xd2/0x130 [zfs]
[  +0.000038]  spa_sync+0x452/0xdb0 [zfs]
[  +0.000041]  txg_sync_thread+0x2e2/0x4b0 [zfs]
[  +0.000040]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  +0.000006]  thread_generic_wrapper+0x72/0x80 [spl]
[  +0.000004]  kthread+0x109/0x140
[  +0.000004]  ? __thread_exit+0x20/0x20 [spl]
[  +0.000003]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +7.381034] INFO: task zvol:572 blocked for more than 5 seconds.
[  +0.000077]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000085] zvol            D    0   572      2 0x00000000
[  +0.000003] Call Trace:
[  +0.000007]  __schedule+0x232/0x700
[  +0.000053]  ? metaslab_alloc+0xd92/0xf20 [zfs]
[  +0.000002]  schedule+0x36/0x80
[  +0.000002]  schedule_timeout+0x235/0x3f0
[  +0.000004]  ? __wake_up+0x44/0x50
[  +0.000007]  ? taskq_dispatch_ent+0x55/0x170 [spl]
[  +0.000043]  ? zio_reexecute+0x390/0x390 [zfs]
[  +0.000002]  io_schedule_timeout+0xa4/0x110
[  +0.000006]  cv_wait_common+0xbc/0x140 [spl]
[  +0.000002]  ? wake_atomic_t_function+0x60/0x60
[  +0.000005]  __cv_wait_io+0x18/0x20 [spl]
[  +0.000042]  zio_wait+0xfd/0x1b0 [zfs]
[  +0.000041]  zil_commit.part.13+0x491/0x8a0 [zfs]
[  +0.000040]  zil_commit+0x17/0x20 [zfs]
[  +0.000040]  zvol_write+0x522/0x570 [zfs]
[  +0.000006]  taskq_thread+0x260/0x460 [spl]
[  +0.000002]  ? wake_up_q+0x70/0x70
[  +0.000003]  kthread+0x109/0x140
[  +0.000005]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000002]  ? kthread_create_on_node+0x60/0x60
[  +0.000003]  ret_from_fork+0x2c/0x40
[  +0.000029] INFO: task z_wr_iss:11573 blocked for more than 5 seconds.
[  +0.000072]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000084] z_wr_iss        D    0 11573      2 0x00000000
[  +0.000002] Call Trace:
[  +0.000002]  __schedule+0x232/0x700
[  +0.000002]  schedule+0x36/0x80
[  +0.000001]  schedule_preempt_disabled+0xe/0x10
[  +0.000002]  __mutex_lock_slowpath+0x193/0x290
[  +0.000002]  mutex_lock+0x2f/0x40
[  +0.000040]  metaslab_alloc+0x52b/0xf20 [zfs]
[  +0.000041]  zio_dva_allocate+0xb6/0x5d0 [zfs]
[  +0.000002]  ? mutex_lock+0x12/0x40
[  +0.000039]  ? zio_wait_for_children+0x80/0xa0 [zfs]
[  +0.000006]  ? tsd_hash_search.isra.2+0x4a/0xa0 [spl]
[  +0.000004]  ? tsd_get_by_thread+0x2e/0x40 [spl]
[  +0.000005]  ? taskq_member+0x18/0x30 [spl]
[  +0.000039]  zio_execute+0x8a/0xe0 [zfs]
[  +0.000006]  taskq_thread+0x260/0x460 [spl]
[  +0.000001]  ? wake_up_q+0x70/0x70
[  +0.000003]  kthread+0x109/0x140
[  +0.000004]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000003]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +0.000005] INFO: task txg_sync:11730 blocked for more than 5 seconds.
[  +0.000071]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000084] txg_sync        D    0 11730      2 0x00000000
[  +0.000001] Call Trace:
[  +0.000002]  __schedule+0x232/0x700
[  +0.000001]  schedule+0x36/0x80
[  +0.000006]  vcmn_err+0x9c/0x100 [spl]
[  +0.000001]  ? __schedule+0x23a/0x700
[  +0.000003]  ? update_load_avg+0x6b/0x550
[  +0.000003]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000005]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000002]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000004]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000042]  zfs_panic_recover+0x6c/0x90 [zfs]
[  +0.000039]  range_tree_add+0x2b0/0x2c0 [zfs]
[  +0.000002]  ? mutex_lock+0x12/0x40
[  +0.000039]  ? zio_add_child+0x15a/0x180 [zfs]
[  +0.000039]  metaslab_free_dva+0x145/0x400 [zfs]
[  +0.000038]  metaslab_free+0x92/0xd0 [zfs]
[  +0.000040]  ? spa_avz_build+0x130/0x130 [zfs]
[  +0.000039]  zio_dva_free+0x1c/0x30 [zfs]
[  +0.000039]  zio_nowait+0xb6/0x150 [zfs]
[  +0.000040]  spa_free_sync_cb+0x40/0x50 [zfs]
[  +0.000029]  bplist_iterate+0xd2/0x130 [zfs]
[  +0.000040]  spa_sync+0x452/0xdb0 [zfs]
[  +0.000042]  txg_sync_thread+0x2e2/0x4b0 [zfs]
[  +0.000041]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  +0.000005]  thread_generic_wrapper+0x72/0x80 [spl]
[  +0.000003]  kthread+0x109/0x140
[  +0.000004]  ? __thread_exit+0x20/0x20 [spl]
[  +0.000002]  ? kthread_create_on_node+0x60/0x60
[  +0.000003]  ret_from_fork+0x2c/0x40
[  +5.118330] INFO: task zvol:572 blocked for more than 5 seconds.
[  +0.000077]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000085] zvol            D    0   572      2 0x00000000
[  +0.000003] Call Trace:
[  +0.000007]  __schedule+0x232/0x700
[  +0.000056]  ? metaslab_alloc+0xd92/0xf20 [zfs]
[  +0.000002]  schedule+0x36/0x80
[  +0.000002]  schedule_timeout+0x235/0x3f0
[  +0.000003]  ? __wake_up+0x44/0x50
[  +0.000007]  ? taskq_dispatch_ent+0x55/0x170 [spl]
[  +0.000046]  ? zio_reexecute+0x390/0x390 [zfs]
[  +0.000002]  io_schedule_timeout+0xa4/0x110
[  +0.000006]  cv_wait_common+0xbc/0x140 [spl]
[  +0.000003]  ? wake_atomic_t_function+0x60/0x60
[  +0.000005]  __cv_wait_io+0x18/0x20 [spl]
[  +0.000043]  zio_wait+0xfd/0x1b0 [zfs]
[  +0.000043]  zil_commit.part.13+0x491/0x8a0 [zfs]
[  +0.000042]  zil_commit+0x17/0x20 [zfs]
[  +0.000042]  zvol_write+0x522/0x570 [zfs]
[  +0.000007]  taskq_thread+0x260/0x460 [spl]
[  +0.000002]  ? wake_up_q+0x70/0x70
[  +0.000004]  kthread+0x109/0x140
[  +0.000004]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000003]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +0.000028] INFO: task z_wr_iss:11573 blocked for more than 5 seconds.
[  +0.000072]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000084] z_wr_iss        D    0 11573      2 0x00000000
[  +0.000002] Call Trace:
[  +0.000002]  __schedule+0x232/0x700
[  +0.000002]  schedule+0x36/0x80
[  +0.000002]  schedule_preempt_disabled+0xe/0x10
[  +0.000001]  __mutex_lock_slowpath+0x193/0x290
[  +0.000002]  mutex_lock+0x2f/0x40
[  +0.000042]  metaslab_alloc+0x52b/0xf20 [zfs]
[  +0.000042]  zio_dva_allocate+0xb6/0x5d0 [zfs]
[  +0.000002]  ? mutex_lock+0x12/0x40
[  +0.000041]  ? zio_wait_for_children+0x80/0xa0 [zfs]
[  +0.000006]  ? tsd_hash_search.isra.2+0x4a/0xa0 [spl]
[  +0.000005]  ? tsd_get_by_thread+0x2e/0x40 [spl]
[  +0.000005]  ? taskq_member+0x18/0x30 [spl]
[  +0.000041]  zio_execute+0x8a/0xe0 [zfs]
[  +0.000006]  taskq_thread+0x260/0x460 [spl]
[  +0.000002]  ? wake_up_q+0x70/0x70
[  +0.000003]  kthread+0x109/0x140
[  +0.000004]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000003]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +0.000005] INFO: task txg_sync:11730 blocked for more than 5 seconds.
[  +0.000072]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000084] txg_sync        D    0 11730      2 0x00000000
[  +0.000001] Call Trace:
[  +0.000002]  __schedule+0x232/0x700
[  +0.000002]  schedule+0x36/0x80
[  +0.000006]  vcmn_err+0x9c/0x100 [spl]
[  +0.000001]  ? __schedule+0x23a/0x700
[  +0.000004]  ? update_load_avg+0x6b/0x550
[  +0.000003]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000005]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000002]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000004]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000044]  zfs_panic_recover+0x6c/0x90 [zfs]
[  +0.000042]  range_tree_add+0x2b0/0x2c0 [zfs]
[  +0.000002]  ? mutex_lock+0x12/0x40
[  +0.000041]  ? zio_add_child+0x15a/0x180 [zfs]
[  +0.000041]  metaslab_free_dva+0x145/0x400 [zfs]
[  +0.000040]  metaslab_free+0x92/0xd0 [zfs]
[  +0.000042]  ? spa_avz_build+0x130/0x130 [zfs]
[  +0.000041]  zio_dva_free+0x1c/0x30 [zfs]
[  +0.000041]  zio_nowait+0xb6/0x150 [zfs]
[  +0.000042]  spa_free_sync_cb+0x40/0x50 [zfs]
[  +0.000031]  bplist_iterate+0xd2/0x130 [zfs]
[  +0.000042]  spa_sync+0x452/0xdb0 [zfs]
[  +0.000045]  txg_sync_thread+0x2e2/0x4b0 [zfs]
[  +0.000043]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  +0.000005]  thread_generic_wrapper+0x72/0x80 [spl]
[  +0.000003]  kthread+0x109/0x140
[  +0.000004]  ? __thread_exit+0x20/0x20 [spl]
[  +0.000003]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +5.118246] INFO: task zvol:572 blocked for more than 5 seconds.
[  +0.000076]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000083] zvol            D    0   572      2 0x00000000
[  +0.000003] Call Trace:
[  +0.000006]  __schedule+0x232/0x700
[  +0.000046]  ? metaslab_alloc+0xd92/0xf20 [zfs]
[  +0.000001]  schedule+0x36/0x80
[  +0.000002]  schedule_timeout+0x235/0x3f0
[  +0.000002]  ? __wake_up+0x44/0x50
[  +0.000006]  ? taskq_dispatch_ent+0x55/0x170 [spl]
[  +0.000037]  ? zio_reexecute+0x390/0x390 [zfs]
[  +0.000002]  io_schedule_timeout+0xa4/0x110
[  +0.000005]  cv_wait_common+0xbc/0x140 [spl]
[  +0.000002]  ? wake_atomic_t_function+0x60/0x60
[  +0.000004]  __cv_wait_io+0x18/0x20 [spl]
[  +0.000034]  zio_wait+0xfd/0x1b0 [zfs]
[  +0.000033]  zil_commit.part.13+0x491/0x8a0 [zfs]
[  +0.000033]  zil_commit+0x17/0x20 [zfs]
[  +0.000033]  zvol_write+0x522/0x570 [zfs]
[  +0.000005]  taskq_thread+0x260/0x460 [spl]
[  +0.000002]  ? wake_up_q+0x70/0x70
[  +0.000004]  kthread+0x109/0x140
[  +0.000003]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000002]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +0.000026] INFO: task z_wr_iss:11573 blocked for more than 5 seconds.
[  +0.000070]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000082] z_wr_iss        D    0 11573      2 0x00000000
[  +0.000002] Call Trace:
[  +0.000002]  __schedule+0x232/0x700
[  +0.000001]  schedule+0x36/0x80
[  +0.000001]  schedule_preempt_disabled+0xe/0x10
[  +0.000002]  __mutex_lock_slowpath+0x193/0x290
[  +0.000001]  mutex_lock+0x2f/0x40
[  +0.000033]  metaslab_alloc+0x52b/0xf20 [zfs]
[  +0.000033]  zio_dva_allocate+0xb6/0x5d0 [zfs]
[  +0.000002]  ? mutex_lock+0x12/0x40
[  +0.000033]  ? zio_wait_for_children+0x80/0xa0 [zfs]
[  +0.000004]  ? tsd_hash_search.isra.2+0x4a/0xa0 [spl]
[  +0.000004]  ? tsd_get_by_thread+0x2e/0x40 [spl]
[  +0.000004]  ? taskq_member+0x18/0x30 [spl]
[  +0.000032]  zio_execute+0x8a/0xe0 [zfs]
[  +0.000004]  taskq_thread+0x260/0x460 [spl]
[  +0.000002]  ? wake_up_q+0x70/0x70
[  +0.000002]  kthread+0x109/0x140
[  +0.000004]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000002]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +0.000004] INFO: task txg_sync:11730 blocked for more than 5 seconds.
[  +0.000070]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000082] txg_sync        D    0 11730      2 0x00000000
[  +0.000001] Call Trace:
[  +0.000002]  __schedule+0x232/0x700
[  +0.000001]  schedule+0x36/0x80
[  +0.000004]  vcmn_err+0x9c/0x100 [spl]
[  +0.000002]  ? __schedule+0x23a/0x700
[  +0.000003]  ? update_load_avg+0x6b/0x550
[  +0.000002]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000004]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000002]  ? kmem_cache_alloc+0xd7/0x1b0
[  +0.000003]  ? spl_kmem_cache_alloc+0x72/0x7d0 [spl]
[  +0.000035]  zfs_panic_recover+0x6c/0x90 [zfs]
[  +0.000033]  range_tree_add+0x2b0/0x2c0 [zfs]
[  +0.000001]  ? mutex_lock+0x12/0x40
[  +0.000033]  ? zio_add_child+0x15a/0x180 [zfs]
[  +0.000032]  metaslab_free_dva+0x145/0x400 [zfs]
[  +0.000031]  metaslab_free+0x92/0xd0 [zfs]
[  +0.000033]  ? spa_avz_build+0x130/0x130 [zfs]
[  +0.000032]  zio_dva_free+0x1c/0x30 [zfs]
[  +0.000032]  zio_nowait+0xb6/0x150 [zfs]
[  +0.000033]  spa_free_sync_cb+0x40/0x50 [zfs]
[  +0.000024]  bplist_iterate+0xd2/0x130 [zfs]
[  +0.000033]  spa_sync+0x452/0xdb0 [zfs]
[  +0.000035]  txg_sync_thread+0x2e2/0x4b0 [zfs]
[  +0.000034]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  +0.000004]  thread_generic_wrapper+0x72/0x80 [spl]
[  +0.000002]  kthread+0x109/0x140
[  +0.000004]  ? __thread_exit+0x20/0x20 [spl]
[  +0.000002]  ? kthread_create_on_node+0x60/0x60
[  +0.000002]  ret_from_fork+0x2c/0x40
[  +5.118537] INFO: task zvol:572 blocked for more than 5 seconds.
[  +0.000086]       Tainted: P           OE   4.10.0-26-generic #30~16.04.1-Ubuntu
[  +0.000084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000084] zvol            D    0   572      2 0x00000000
[  +0.000004] Call Trace:
[  +0.000008]  __schedule+0x232/0x700
[  +0.000059]  ? metaslab_alloc+0xd92/0xf20 [zfs]
[  +0.000002]  schedule+0x36/0x80
[  +0.000002]  schedule_timeout+0x235/0x3f0
[  +0.000004]  ? __wake_up+0x44/0x50
[  +0.000008]  ? taskq_dispatch_ent+0x55/0x170 [spl]
[  +0.000049]  ? zio_reexecute+0x390/0x390 [zfs]
[  +0.000002]  io_schedule_timeout+0xa4/0x110
[  +0.000007]  cv_wait_common+0xbc/0x140 [spl]
[  +0.000003]  ? wake_atomic_t_function+0x60/0x60
[  +0.000005]  __cv_wait_io+0x18/0x20 [spl]
[  +0.000046]  zio_wait+0xfd/0x1b0 [zfs]
[  +0.000046]  zil_commit.part.13+0x491/0x8a0 [zfs]
[  +0.000045]  zil_commit+0x17/0x20 [zfs]
[  +0.000045]  zvol_write+0x522/0x570 [zfs]
[  +0.000007]  taskq_thread+0x260/0x460 [spl]
[  +0.000003]  ? wake_up_q+0x70/0x70
[  +0.000004]  kthread+0x109/0x140
[  +0.000005]  ? taskq_cancel_id+0x130/0x130 [spl]
[  +0.000003]  ? kthread_create_on_node+0x60/0x60
[  +0.000003]  ret_from_fork+0x2c/0x40

@bunder2015
Copy link
Contributor Author

Just a follow-up, I changed the vm setup to using the zvol as a virtio device instead of ide/sata and was able to install OI. This might be a configuration problem on kvm after all.

@loli10K
Copy link
Contributor

loli10K commented Aug 3, 2017

@bunder2015 is this a debug build of ZFS? If not, could you please try to reproduce this with debugging enabled? Thanks.

This may just be a duplicate of #6238.

@loli10K loli10K marked this as a duplicate of #6238 Aug 3, 2017
@bunder2015
Copy link
Contributor Author

bunder2015 commented Aug 3, 2017

I'm afraid the EFI partition on my laptop is too small to enable debugging and use stub booting at the same time. I can possibly try again after we get openzfs 7446.

@loli10K
Copy link
Contributor

loli10K commented Aug 3, 2017

@bunder2015 just to be clear i was talking about the --enable-debug option in the ./configure script, not custom -OX -g CFLAGS.

@bunder2015
Copy link
Contributor Author

I only have 8MB to fit a kernel into, and with --enable-debug the zfs/spl modules come out to about 4.5MB (before compression). The kernel itself is 5MB compressed, when I switched to stub booting I tried to shave off as much as I could but found myself about 300KB too large.

@loli10K
Copy link
Contributor

loli10K commented Aug 4, 2017

@bunder2015 if you're still willing to test this we can make this happen by changing this single ASSERT to VERIFY in dbuf_unoverride().

diff --git a/module/zfs/dbuf.c b/module/zfs/dbuf.c
index dc2c00495..ce40364af 100644
--- a/module/zfs/dbuf.c
+++ b/module/zfs/dbuf.c
@@ -1301,7 +1301,7 @@ dbuf_unoverride(dbuf_dirty_record_t *dr)
         * a zilog's get_data while holding a range lock.  This call only
         * comes from dbuf_dirty() callers who must also hold a range lock.
         */
-       ASSERT(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC);
+       VERIFY(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC);
        ASSERT(db->db_level == 0);
 
        if (db->db_blkid == DMU_BONUS_BLKID ||

@loli10K loli10K self-assigned this Aug 8, 2017
@bunder2015
Copy link
Contributor Author

Hi sorry for the delay, been busy off and on lately and haven't had time to do any testing. Do you still want me to test this? I might have some time this afternoon.

@filip-paczynski
Copy link

Hi, is it possible, that this is fixed in 0.7.1 as part of #6414 work?

@filip-paczynski
Copy link

filip-paczynski commented Aug 10, 2017

Nevermind, it still crashes.

Debian stretch (9.1)
Kernel: Linux 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux
XEN Dom0 : xen-hypervisor-4.8-amd64 4.8.1-1+deb9u1
16gb ram ARC, 100gb L2ARC, 16gb SLOG

ZFS 0.7.0-18_g5146d802
SPL 0.7.0 - commit 9243b0f

The machine hosts XEN VMs.
The controller is LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(07.39.02.00)
The error occured after executing apt update in one of the VMs.
This also happens on a second machine (Dell R730XD, different controller, different cpu, also hosting xen, same OS)

Right now I wonder:

  1. How can I possibly reduce the chance of this happening again?
  2. Do I need to re-create the pool after such a crash?

Pool layout:

        NAME           STATE     READ WRITE CKSUM
        tank   ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            slot1      ONLINE       0     0     0
            slot7      ONLINE       0     0     0
          mirror-1     ONLINE       0     0     0
            slot2      ONLINE       0     0     0
            slot8      ONLINE       0     0     0
          mirror-2     ONLINE       0     0     0
            slot3      ONLINE       0     0     0
            slot9      ONLINE       0     0     0
          mirror-3     ONLINE       0     0     0
            slot4      ONLINE       0     0     0
            slot10     ONLINE       0     0     0
          mirror-4     ONLINE       0     0     0
            slot5      ONLINE       0     0     0
            slot11     ONLINE       0     0     0
        logs
          mirror-5     ONLINE       0     0     0
            nvme0n1p1  ONLINE       0     0     0
            nvme1n1p1  ONLINE       0     0     0
        cache
          nvme0n1p2    ONLINE       0     0     0
          nvme1n1p2    ONLINE       0     0     0
        spares
          sda          AVAIL

kern.log- PANIC:

Aug 10 10:53:36 kernel:PANIC: zfs: allocating allocated segment(offset=508713349120 size=12288)                               
Aug 10 10:53:36 kernel:                                                                                                       
Aug 10 10:53:36 kernel:Showing stack for process 2404                                                                         
Aug 10 10:53:36 kernel:CPU: 1 PID: 2404 Comm: txg_sync Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2      
Aug 10 10:53:36 kernel:Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.5.4 01/22/2016                                
Aug 10 10:53:36 kernel: 0000000000000000 ffffffff81328414 0000000000000003 ffffc9004b2b7b58                                   
Aug 10 10:53:36 kernel: ffffffffc017b422 ffff880543258240 6c6c61203a73667a 20676e697461636f                                   
Aug 10 10:53:36 kernel: 657461636f6c6c61 6e656d6765732064 74657366666f2874 333331373830353d                                   
Aug 10 10:53:36 kernel:Call Trace:                                                                                            
Aug 10 10:53:36 kernel: [<ffffffff81328414>] ? dump_stack+0x5c/0x78                                                           
Aug 10 10:53:36 kernel: [<ffffffffc017b422>] ? vcmn_err+0x62/0xf0 [spl]                                                       
Aug 10 10:53:36 kernel: [<ffffffff811e012c>] ? kmem_cache_alloc+0x11c/0x520                                                   
Aug 10 10:53:36 kernel: [<ffffffff81601cb7>] ? _cond_resched+0x27/0x40                                                        
Aug 10 10:53:36 kernel: [<ffffffff811e012c>] ? kmem_cache_alloc+0x11c/0x520                                                   
Aug 10 10:53:36 kernel: [<ffffffff811e012c>] ? kmem_cache_alloc+0x11c/0x520                                                   
Aug 10 10:53:36 kernel: [<ffffffffc01770d1>] ? spl_kmem_cache_alloc+0x71/0x760 [spl]                                          
Aug 10 10:53:36 kernel: [<ffffffffc08561cf>] ? zfs_panic_recover+0x6f/0x90 [zfs]                                              
Aug 10 10:53:36 kernel: [<ffffffffc0760141>] ? avl_find+0x51/0x90 [zavl]                                                      
Aug 10 10:53:36 kernel: [<ffffffffc083da43>] ? range_tree_add+0x183/0x2c0 [zfs]                                               
Aug 10 10:53:36 kernel: [<ffffffffc08acdf1>] ? zio_add_child+0x131/0x140 [zfs]                                                
Aug 10 10:53:36 kernel: [<ffffffffc0837877>] ? metaslab_free_dva+0x147/0x3f0 [zfs]                                            
Aug 10 10:53:36 kernel: [<ffffffffc0846090>] ? spa_avz_build+0x120/0x120 [zfs]                                                
Aug 10 10:53:36 kernel: [<ffffffffc083b440>] ? metaslab_free+0xa0/0xe0 [zfs]                                                  
Aug 10 10:53:36 kernel: [<ffffffffc08aa608>] ? zio_dva_free+0x18/0x20 [zfs]                                                   
Aug 10 10:53:36 kernel: [<ffffffffc08aeb57>] ? zio_nowait+0xa7/0x140 [zfs]                                                    
Aug 10 10:53:36 kernel: [<ffffffffc08460cb>] ? spa_free_sync_cb+0x3b/0x50 [zfs]                                               
Aug 10 10:53:36 kernel: [<ffffffffc07f4112>] ? bplist_iterate+0xa2/0x100 [zfs]                                                
Aug 10 10:53:36 kernel: [<ffffffffc0848863>] ? spa_sync+0x433/0xd80 [zfs]                                                     
Aug 10 10:53:36 kernel: [<ffffffffc085ac31>] ? txg_sync_thread+0x2e1/0x4b0 [zfs]                                              
Aug 10 10:53:36 kernel: [<ffffffffc085a950>] ? txg_delay+0x1a0/0x1a0 [zfs]                                                    
Aug 10 10:53:36 kernel: [<ffffffffc0178230>] ? __thread_exit+0x20/0x20 [spl]                                                  
Aug 10 10:53:36 kernel: [<ffffffffc017829d>] ? thread_generic_wrapper+0x6d/0x80 [spl]                                         
Aug 10 10:53:36 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0                                                              
Aug 10 10:53:36 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60                                                         
Aug 10 10:53:36 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60                                                         
Aug 10 10:53:36 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30                                                        

kern.log - subsequent timeouts:

Aug 10 10:55:44 kernel:INFO: task z_wr_iss:1701 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:z_wr_iss        D    0  1701      2 0x00000000
Aug 10 10:55:44 kernel: ffff88050d8af800 0000000000000000 ffff8805388d1080 ffff880543458240
Aug 10 10:55:44 kernel: ffff88053bb56080 ffffc90044b47ae8 ffffffff816015d3 53e1dd739d413e16
Aug 10 10:55:44 kernel: 002c39e140a25a47 ffff880543458240 8895ce335bdceeba ffff8805388d1080
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601d3a>] ? schedule_preempt_disabled+0xa/0x10
Aug 10 10:55:44 kernel: [<ffffffff81603784>] ? __mutex_lock_slowpath+0xb4/0x130
Aug 10 10:55:44 kernel: [<ffffffff8160381b>] ? mutex_lock+0x1b/0x30
Aug 10 10:55:44 kernel: [<ffffffffc083a9ba>] ? metaslab_alloc+0x6ca/0x10b0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aee0e>] ? zio_dva_allocate+0xbe/0x620 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0175f33>] ? spl_kmem_alloc+0x93/0x160 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08abcb4>] ? zio_push_transform+0x34/0x80 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc017de5d>] ? tsd_hash_search.isra.0+0x6d/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc017deea>] ? tsd_get_by_thread+0x2a/0x40 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08ab961>] ? zio_execute+0x81/0xe0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc01792b6>] ? taskq_thread+0x286/0x460 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810a1800>] ? wake_up_q+0x70/0x70
Aug 10 10:55:44 kernel: [<ffffffffc08ab8e0>] ? zio_reexecute+0x390/0x390 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0179030>] ? task_done+0x90/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
Aug 10 10:55:44 kernel:INFO: task z_wr_iss:1702 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:z_wr_iss        D    0  1702      2 0x00000000
Aug 10 10:55:44 kernel: ffff88051d35b000 0000000000000000 ffff88052c50e140 ffff880543418240
Aug 10 10:55:44 kernel: ffff88053bb51040 ffffc90044b4fae8 ffffffff816015d3 923032cd7598d763
Aug 10 10:55:44 kernel: 00c77cd31add3ebe ffff880543418240 62ee7cb3c5aaa748 ffff88052c50e140
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601d3a>] ? schedule_preempt_disabled+0xa/0x10
Aug 10 10:55:44 kernel: [<ffffffff81603784>] ? __mutex_lock_slowpath+0xb4/0x130
Aug 10 10:55:44 kernel: [<ffffffff8160381b>] ? mutex_lock+0x1b/0x30
Aug 10 10:55:44 kernel: [<ffffffffc083a9ba>] ? metaslab_alloc+0x6ca/0x10b0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aee0e>] ? zio_dva_allocate+0xbe/0x620 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0175f33>] ? spl_kmem_alloc+0x93/0x160 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08abcb4>] ? zio_push_transform+0x34/0x80 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc017de5d>] ? tsd_hash_search.isra.0+0x6d/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc017deea>] ? tsd_get_by_thread+0x2a/0x40 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08ab961>] ? zio_execute+0x81/0xe0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc01792b6>] ? taskq_thread+0x286/0x460 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810a1800>] ? wake_up_q+0x70/0x70
Aug 10 10:55:44 kernel: [<ffffffffc08ab8e0>] ? zio_reexecute+0x390/0x390 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0179030>] ? task_done+0x90/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
Aug 10 10:55:44 kernel:INFO: task z_wr_iss:1704 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:z_wr_iss        D    0  1704      2 0x00000000
Aug 10 10:55:44 kernel: ffff88051a036800 0000000000000000 ffff880528154080 ffff880543358240
Aug 10 10:55:44 kernel: ffff88053bb3c100 ffffc90044b6fae8 ffffffff816015d3 4984a6d912f025d2
Aug 10 10:55:44 kernel: 0020673adc6d7b53 ffff880543358240 20c3679fcb298059 ffff880528154080
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601d3a>] ? schedule_preempt_disabled+0xa/0x10
Aug 10 10:55:44 kernel: [<ffffffff81603784>] ? __mutex_lock_slowpath+0xb4/0x130
Aug 10 10:55:44 kernel: [<ffffffff8160381b>] ? mutex_lock+0x1b/0x30
Aug 10 10:55:44 kernel: [<ffffffffc083a9ba>] ? metaslab_alloc+0x6ca/0x10b0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aee0e>] ? zio_dva_allocate+0xbe/0x620 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0175f33>] ? spl_kmem_alloc+0x93/0x160 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08abcb4>] ? zio_push_transform+0x34/0x80 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc017de5d>] ? tsd_hash_search.isra.0+0x6d/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc017deea>] ? tsd_get_by_thread+0x2a/0x40 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08ab961>] ? zio_execute+0x81/0xe0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc01792b6>] ? taskq_thread+0x286/0x460 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810a1800>] ? wake_up_q+0x70/0x70
Aug 10 10:55:44 kernel: [<ffffffffc08ab8e0>] ? zio_reexecute+0x390/0x390 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0179030>] ? task_done+0x90/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffff8107bb0a>] ? do_group_exit+0x3a/0xa0
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
Aug 10 10:55:44 kernel:INFO: task z_wr_iss:1706 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:z_wr_iss        D    0  1706      2 0x00000000
Aug 10 10:55:44 kernel: ffff88050d8af800 0000000000000000 ffff880534857100 ffff880543258240
Aug 10 10:55:44 kernel: ffff88053bb21000 ffffc90044b7fae8 ffffffff816015d3 216713a57c1ca9f0
Aug 10 10:55:44 kernel: 00156cf5c73063a3 ffff880543258240 3943f35e7ec2c403 ffff880534857100
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601d3a>] ? schedule_preempt_disabled+0xa/0x10
Aug 10 10:55:44 kernel: [<ffffffff81603784>] ? __mutex_lock_slowpath+0xb4/0x130
Aug 10 10:55:44 kernel: [<ffffffff8160381b>] ? mutex_lock+0x1b/0x30
Aug 10 10:55:44 kernel: [<ffffffffc083a9ba>] ? metaslab_alloc+0x6ca/0x10b0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aee0e>] ? zio_dva_allocate+0xbe/0x620 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0175f33>] ? spl_kmem_alloc+0x93/0x160 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08abcb4>] ? zio_push_transform+0x34/0x80 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc017de5d>] ? tsd_hash_search.isra.0+0x6d/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc017deea>] ? tsd_get_by_thread+0x2a/0x40 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08ab961>] ? zio_execute+0x81/0xe0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc01792b6>] ? taskq_thread+0x286/0x460 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810a1800>] ? wake_up_q+0x70/0x70
Aug 10 10:55:44 kernel: [<ffffffffc08ab8e0>] ? zio_reexecute+0x390/0x390 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0179030>] ? task_done+0x90/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc03a0650>] ? __ioat_issue_pending+0xb0/0xb0 [ioatdma]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
Aug 10 10:55:44 kernel:INFO: task z_wr_iss:1707 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:z_wr_iss        D    0  1707      2 0x00000000
Aug 10 10:55:44 kernel: ffff88051480e800 0000000000000000 ffff880535bcc0c0 ffff8805433d8240
Aug 10 10:55:44 kernel: ffff88053bb46000 ffffc90044fd3ae8 ffffffff816015d3 c0a3e1d6d399bec4
Aug 10 10:55:44 kernel: 00c0532ff91982d1 ffff8805433d8240 0515e334463a1de9 ffff880535bcc0c0
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601d3a>] ? schedule_preempt_disabled+0xa/0x10
Aug 10 10:55:44 kernel: [<ffffffff81603784>] ? __mutex_lock_slowpath+0xb4/0x130
Aug 10 10:55:44 kernel: [<ffffffff8160381b>] ? mutex_lock+0x1b/0x30
Aug 10 10:55:44 kernel: [<ffffffffc083a9ba>] ? metaslab_alloc+0x6ca/0x10b0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aee0e>] ? zio_dva_allocate+0xbe/0x620 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0175f33>] ? spl_kmem_alloc+0x93/0x160 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08abcb4>] ? zio_push_transform+0x34/0x80 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc017de5d>] ? tsd_hash_search.isra.0+0x6d/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc017deea>] ? tsd_get_by_thread+0x2a/0x40 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08ab961>] ? zio_execute+0x81/0xe0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc01792b6>] ? taskq_thread+0x286/0x460 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810a1800>] ? wake_up_q+0x70/0x70
Aug 10 10:55:44 kernel: [<ffffffffc08ab8e0>] ? zio_reexecute+0x390/0x390 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0179030>] ? task_done+0x90/0x90 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
Aug 10 10:55:44 kernel:INFO: task txg_sync:2404 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:txg_sync        D    0  2404      2 0x00000000
Aug 10 10:55:44 kernel: ffff88051480e800 0000000000000000 ffff88052d0a1100 ffff880543258240
Aug 10 10:55:44 kernel: ffff88053bb21000 ffffc9004b2b79c8 ffffffff816015d3 ffffffff81c06a40
Aug 10 10:55:44 kernel: 0000000000000000 ffff880543258240 000000000000021e ffff88052d0a1100
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffffc017b448>] ? vcmn_err+0x88/0xf0 [spl]
Aug 10 10:55:44 kernel: [<ffffffff811e012c>] ? kmem_cache_alloc+0x11c/0x520
Aug 10 10:55:44 kernel: [<ffffffff81601cb7>] ? _cond_resched+0x27/0x40
Aug 10 10:55:44 kernel: [<ffffffff811e012c>] ? kmem_cache_alloc+0x11c/0x520
Aug 10 10:55:44 kernel: [<ffffffff811e012c>] ? kmem_cache_alloc+0x11c/0x520
Aug 10 10:55:44 kernel: [<ffffffffc01770d1>] ? spl_kmem_cache_alloc+0x71/0x760 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc08561cf>] ? zfs_panic_recover+0x6f/0x90 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0760141>] ? avl_find+0x51/0x90 [zavl]
Aug 10 10:55:44 kernel: [<ffffffffc083da43>] ? range_tree_add+0x183/0x2c0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08acdf1>] ? zio_add_child+0x131/0x140 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0837877>] ? metaslab_free_dva+0x147/0x3f0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0846090>] ? spa_avz_build+0x120/0x120 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc083b440>] ? metaslab_free+0xa0/0xe0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aa608>] ? zio_dva_free+0x18/0x20 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08aeb57>] ? zio_nowait+0xa7/0x140 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08460cb>] ? spa_free_sync_cb+0x3b/0x50 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc07f4112>] ? bplist_iterate+0xa2/0x100 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0848863>] ? spa_sync+0x433/0xd80 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc085ac31>] ? txg_sync_thread+0x2e1/0x4b0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc085a950>] ? txg_delay+0x1a0/0x1a0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc0178230>] ? __thread_exit+0x20/0x20 [spl]
Aug 10 10:55:44 kernel: [<ffffffffc017829d>] ? thread_generic_wrapper+0x6d/0x80 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
Aug 10 10:55:44 kernel:INFO: task 7.xvda-0:2471 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:7.xvda-0        D    0  2471      2 0x00000000
Aug 10 10:55:44 kernel: ffff8805166c4000 0000000000000000 ffff8804ce17d140 ffff8805432d8240
Aug 10 10:55:44 kernel: ffff88053bb32080 ffffc9006846b9a0 ffffffff816015d3 0000000000000000
Aug 10 10:55:44 kernel: 0000000000000003 ffff8805432d8240 ffffffff810b8799 ffff8804ce17d140
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff810b8799>] ? __wake_up_common+0x49/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81604e73>] ? schedule_timeout+0x243/0x310
Aug 10 10:55:44 kernel: [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
Aug 10 10:55:44 kernel: [<ffffffffc0178d07>] ? taskq_dispatch_ent+0xf7/0x130 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810ec0ec>] ? ktime_get+0x3c/0xb0
Aug 10 10:55:44 kernel: [<ffffffff8160133d>] ? io_schedule_timeout+0x9d/0x100
Aug 10 10:55:44 kernel: [<ffffffffc017d316>] ? cv_wait_common+0xb6/0x140 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
Aug 10 10:55:44 kernel: [<ffffffffc08ae45a>] ? zio_wait+0xea/0x190 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08a7df4>] ? zil_commit.part.12+0x4c4/0x870 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08bc257>] ? zvol_request+0xb7/0x2e0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffff812fb080>] ? generic_make_request+0x110/0x2d0
Aug 10 10:55:44 kernel: [<ffffffff812fb2b6>] ? submit_bio+0x76/0x140
Aug 10 10:55:44 kernel: [<ffffffffc05349c3>] ? dispatch_rw_block_io+0x7a3/0xac0 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff810151b4>] ? xen_load_sp0+0x84/0x160
Aug 10 10:55:44 kernel: [<ffffffff810e2d3b>] ? lock_timer_base+0x7b/0xa0
Aug 10 10:55:44 kernel: [<ffffffffc053500d>] ? __do_block_io_op+0x32d/0x650 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffffc05357ea>] ? xen_blkif_schedule+0x11a/0x7d0 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff816015db>] ? __schedule+0x23b/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
Aug 10 10:55:44 kernel: [<ffffffffc05356d0>] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30

Aug 10 10:55:44 kernel:INFO: task 15.xvdc-0:11177 blocked for more than 120 seconds.
Aug 10 10:55:44 kernel:      Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 10 10:55:44 kernel:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 10:55:44 kernel:15.xvdc-0       D    0 11177      2 0x00000000
Aug 10 10:55:44 kernel: ffff88051480e800 0000000000000000 ffff880501927140 ffff880543398240
Aug 10 10:55:44 kernel: ffff88053bb41140 ffffc900675e79a0 ffffffff816015d3 0000000000000000
Aug 10 10:55:44 kernel: 0000000000000003 ffff880543398240 ffffffff810b8799 ffff880501927140
Aug 10 10:55:44 kernel:Call Trace:
Aug 10 10:55:44 kernel: [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff810b8799>] ? __wake_up_common+0x49/0x80
Aug 10 10:55:44 kernel: [<ffffffff81601aa2>] ? schedule+0x32/0x80
Aug 10 10:55:44 kernel: [<ffffffff81604e73>] ? schedule_timeout+0x243/0x310
Aug 10 10:55:44 kernel: [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
Aug 10 10:55:44 kernel: [<ffffffffc0178d07>] ? taskq_dispatch_ent+0xf7/0x130 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810ec0ec>] ? ktime_get+0x3c/0xb0
Aug 10 10:55:44 kernel: [<ffffffff8160133d>] ? io_schedule_timeout+0x9d/0x100
Aug 10 10:55:44 kernel: [<ffffffffc017d316>] ? cv_wait_common+0xb6/0x140 [spl]
Aug 10 10:55:44 kernel: [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
Aug 10 10:55:44 kernel: [<ffffffffc08ae45a>] ? zio_wait+0xea/0x190 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08a7df4>] ? zil_commit.part.12+0x4c4/0x870 [zfs]
Aug 10 10:55:44 kernel: [<ffffffffc08bc257>] ? zvol_request+0xb7/0x2e0 [zfs]
Aug 10 10:55:44 kernel: [<ffffffff812fb080>] ? generic_make_request+0x110/0x2d0
Aug 10 10:55:44 kernel: [<ffffffff812fb2b6>] ? submit_bio+0x76/0x140
Aug 10 10:55:44 kernel: [<ffffffffc05349c3>] ? dispatch_rw_block_io+0x7a3/0xac0 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff810151b4>] ? xen_load_sp0+0x84/0x160
Aug 10 10:55:44 kernel: [<ffffffff810e2d3b>] ? lock_timer_base+0x7b/0xa0
Aug 10 10:55:44 kernel: [<ffffffffc053500d>] ? __do_block_io_op+0x32d/0x650 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff81604db6>] ? schedule_timeout+0x186/0x310
Aug 10 10:55:44 kernel: [<ffffffff810e3e50>] ? del_timer_sync+0x50/0x50
Aug 10 10:55:44 kernel: [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
Aug 10 10:55:44 kernel: [<ffffffffc05357ea>] ? xen_blkif_schedule+0x11a/0x7d0 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff816015db>] ? __schedule+0x23b/0x6d0
Aug 10 10:55:44 kernel: [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
Aug 10 10:55:44 kernel: [<ffffffffc05356d0>] ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Aug 10 10:55:44 kernel: [<ffffffff810965d7>] ? kthread+0xd7/0xf0
Aug 10 10:55:44 kernel: [<ffffffff81096500>] ? kthread_park+0x60/0x60
Aug 10 10:55:44 kernel: [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30

@bunder2015
Copy link
Contributor Author

So I upgraded to git head again and applied the assert/verify toggle patch and now it's not happening anymore. Maybe this got patched between 0.7.0 and 0.7.1, will need to test more to confirm.

@filip-paczynski
Copy link

filip-paczynski commented Aug 11, 2017

For me assert/verify patch just move the problem to another place:

Message from syslogd@dom0 at Aug 11 12:21:01 ...
 kernel:[ 1992.861466] VERIFY(dr->dt.dl.dr_override_state != DR_IN_DMU_SYNC) failed

Message from syslogd@dom0 at Aug 11 12:21:01 ...
 kernel:[ 1992.862046] PANIC at dbuf.c:1304:dbuf_unoverride()

in addition it kind of limits the scope of what has crashed - other zvols are largely accessible even after this kind of crash. It is 'better' than ASSERT - in a sense that it allows rest of the VMs to stop in an orderly fashion.

Removing l2arc doesn't help. Didn't dare to remove SLOGs.

tonyhutter pushed a commit that referenced this issue Aug 22, 2017
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #6238 
Closes #6315 
Closes #6356 
Closes #6477
SidBB pushed a commit to catalogicsoftware/zfs that referenced this issue Aug 31, 2017
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes openzfs#6238
Closes openzfs#6315
Closes openzfs#6356
Closes openzfs#6477
Fabian-Gruenbichler pushed a commit to Fabian-Gruenbichler/zfs that referenced this issue Sep 29, 2017
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes openzfs#6238 
Closes openzfs#6315 
Closes openzfs#6356 
Closes openzfs#6477
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZVOL ZFS Volumes
Projects
None yet
Development

No branches or pull requests

4 participants