Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC- and zrlock-related panics during import #8876

Closed
vthriller opened this issue Jun 9, 2019 · 28 comments
Closed

ARC- and zrlock-related panics during import #8876

vthriller opened this issue Jun 9, 2019 · 28 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@vthriller
Copy link

vthriller commented Jun 9, 2019

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version
Linux Kernel 4.18.12-gentoo
Architecture x86_64
ZFS, SPL Version both v0.7.13-r0-gentoo or v0.8.0-r1-gentoo

Describe the problem you're observing

zfs panics during zfs import. Pools in question were not properly exported before that due to #7987.

zdb -lu have no issues with reading crucial data from any of the disks.

Panics differ between versions with and without USE=debug.

Describe how to reproduce the problem

Dunno. I'm yet to figure out which pool out of three causes this.

Include any warning/errors/backtraces from the system logs

0.7.13, release:

[ 1987.213972] VERIFY3(((*(volatile typeof((&zrl->zr_mtx)->m_owner) *)&((&zrl->zr_mtx)->m_owner))) == ((void *)0)) failed (00000000642c6e4e ==           (null))
[ 1987.213981] PANIC at zrlock.c:68:zrl_destroy()
[ 1987.213985] Showing stack for process 18180
[ 1987.213991] CPU: 3 PID: 18180 Comm: dbu_evict Tainted: P           O      4.18.12-gentoo #2
[ 1987.213993] Hardware name:  /DX79TO, BIOS SIX7910J.86A.0460.2012.0327.1627 03/27/2012
[ 1987.213995] Call Trace:
[ 1987.214023]  dump_stack+0x85/0xba
[ 1987.214038]  spl_dumpstack+0x5d/0x70 [spl]
[ 1987.214047]  spl_panic+0xe6/0x150 [spl]
[ 1987.214057]  ? kmem_cache_free+0x27f/0x2e0
[ 1987.214090]  zrl_destroy+0x44/0x70 [zfs]
[ 1987.214126]  dmu_zfetch+0x115b/0x1be0 [zfs]
[ 1987.214133]  taskq_dispatch+0x69e/0x960 [spl]
[ 1987.214140]  ? try_to_wake_up+0x730/0x730
[ 1987.214162]  ? dmu_zfetch+0x1080/0x1be0 [zfs]
[ 1987.214168]  ? taskq_dispatch+0x330/0x960 [spl]
[ 1987.214185]  kthread+0x16b/0x1a0
[ 1987.214190]  ? kthread_flush_work+0x180/0x180
[ 1987.214194]  ret_from_fork+0x35/0x40

[ 2211.463200] INFO: task dbu_evict:18180 blocked for more than 120 seconds.
[ 2211.463205]       Tainted: P           O      4.18.12-gentoo #2
[ 2211.463218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2211.463221] dbu_evict       D    0 18180      2 0x80000000
[ 2211.463228] Call Trace:
[ 2211.463244]  ? __schedule+0x349/0xd90
[ 2211.463252]  schedule+0x48/0xf0
[ 2211.463262]  spl_panic+0x128/0x150 [spl]
[ 2211.463272]  ? kmem_cache_free+0x27f/0x2e0
[ 2211.463319]  zrl_destroy+0x44/0x70 [zfs]
[ 2211.463351]  dmu_zfetch+0x115b/0x1be0 [zfs]
[ 2211.463359]  taskq_dispatch+0x69e/0x960 [spl]
[ 2211.463365]  ? try_to_wake_up+0x730/0x730
[ 2211.463396]  ? dmu_zfetch+0x1080/0x1be0 [zfs]
[ 2211.463403]  ? taskq_dispatch+0x330/0x960 [spl]
[ 2211.463408]  kthread+0x16b/0x1a0
[ 2211.463413]  ? kthread_flush_work+0x180/0x180
[ 2211.463418]  ret_from_fork+0x35/0x40

[ 2211.463425] INFO: task zpool:18200 blocked for more than 120 seconds.
[ 2211.463428]       Tainted: P           O      4.18.12-gentoo #2
[ 2211.463429] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2211.463431] zpool           D    0 18200   2057 0x00000004
[ 2211.463435] Call Trace:
[ 2211.463440]  ? __schedule+0x349/0xd90
[ 2211.463446]  ? spl_kmem_free_impl+0x51/0x60 [spl]
[ 2211.463450]  schedule+0x48/0xf0
[ 2211.463456]  taskq_wait+0x92/0xf0 [spl]
[ 2211.463463]  ? woken_wake_function+0x30/0x30
[ 2211.463486]  dmu_buf_user_evict_wait+0x19/0x30 [zfs]
[ 2211.463520]  dsl_pool_close+0x18f/0x260 [zfs]
[ 2211.463559]  spa_async_suspend+0xd7f/0x1ab0 [zfs]
[ 2211.463597]  spa_tryimport+0x273/0xfa0 [zfs]
[ 2211.463633]  zfs_secpolicy_share+0x48cd/0x9e10 [zfs]
[ 2211.463667]  pool_status_check+0x33a/0xa20 [zfs]
[ 2211.463675]  do_vfs_ioctl+0xad/0xa40
[ 2211.463681]  ? handle_mm_fault+0x15d/0x3b0
[ 2211.463689]  ? __do_page_fault+0x2b3/0x8f0
[ 2211.463694]  ksys_ioctl+0xc1/0xd0
[ 2211.463700]  __x64_sys_ioctl+0x1e/0x30
[ 2211.463706]  do_syscall_64+0x6f/0x1a0
[ 2211.463712]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

0.7.13, debug (zfs_dbgmsg_enable=1 zfs_flags=33):

[  777.376969] ZFS: Loaded module v0.7.13-r0-gentoo (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[  806.520066] (!copied) is equivalent to (hdr->b_l1hdr.b_freeze_cksum == NULL)
[  806.520072] PANIC at arc.c:1742:arc_buf_try_copy_decompressed_data()
[  806.520083] Showing stack for process 1376
[  806.520090] CPU: 0 PID: 1376 Comm: z_rd_int_7 Tainted: P           O      4.18.12-gentoo #2
[  806.520092] Hardware name:  /DX79TO, BIOS SIX7910J.86A.0460.2012.0327.1627 03/27/2012
[  806.520094] Call Trace:
[  806.520110]  dump_stack+0x85/0xba
[  806.520140]  spl_dumpstack+0x5d/0x70 [spl]
[  806.520157]  spl_panic+0xe6/0x150 [spl]
[  806.520174]  ? spl_kmem_cache_alloc+0xc5/0x1170 [spl]
[  806.520179]  ? __slab_alloc+0x26/0x40
[  806.520183]  ? _cond_resched+0x25/0x70
[  806.520187]  ? kmem_cache_alloc+0x115/0x320
[  806.520201]  ? spl_kmem_cache_alloc+0xc5/0x1170 [spl]
[  806.520205]  ? _cond_resched+0x25/0x70
[  806.520209]  ? _raw_spin_unlock+0x12/0x30
[  806.520395]  ? zfs_refcount_add_many+0xcb/0x150 [zfs]
[  806.520523]  arc_buf_fill+0xaca/0xae0 [zfs]
[  806.520690]  ? zio_buf_alloc+0x69/0x80 [zfs]
[  806.520818]  arc_buf_alloc_impl+0x485/0x7b0 [zfs]
[  806.520945]  arc_read_done+0x262/0xb90 [zfs]
[  806.521112]  zio_done+0x97e/0x23d0 [zfs]
[  806.521274]  ? spa_config_exit+0x11b/0x230 [zfs]
[  806.521439]  zio_execute+0x13e/0x390 [zfs]
[  806.521458]  taskq_thread+0x3c1/0x790 [spl]
[  806.521464]  ? try_to_wake_up+0x730/0x730
[  806.521629]  ? zio_taskq_member.isra.0.constprop.1+0x90/0x90 [zfs]
[  806.521646]  ? taskq_thread_spawn+0x70/0x70 [spl]
[  806.521653]  kthread+0x16b/0x1a0
[  806.521658]  ? kthread_flush_work+0x180/0x180
[  806.521663]  ret_from_fork+0x35/0x40

[  982.663355] INFO: task zpool:1357 blocked for more than 120 seconds.
[  982.663361]       Tainted: P           O      4.18.12-gentoo #2
[  982.663362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.663365] zpool           D    0  1357   2068 0x00000000
[  982.663373] Call Trace:
[  982.663393]  ? __schedule+0x349/0xd90
[  982.663399]  schedule+0x48/0xf0
[  982.663406]  io_schedule+0x22/0x50
[  982.663432]  cv_wait_common+0x13e/0x3a0 [spl]
[  982.663439]  ? woken_wake_function+0x30/0x30
[  982.663456]  __cv_wait_io+0x1c/0x30 [spl]
[  982.663657]  zio_wait+0x1fb/0x500 [zfs]
[  982.663792]  dbuf_read+0xadb/0x1580 [zfs]
[  982.663928]  __dbuf_hold_impl+0xcad/0xed0 [zfs]
[  982.663944]  ? spl_kmem_alloc+0x11c/0x290 [spl]
[  982.664076]  dbuf_hold_impl+0xab/0xe0 [zfs]
[  982.664208]  dbuf_hold+0x34/0x70 [zfs]
[  982.664344]  dmu_buf_hold_array_by_dnode+0x124/0x7d0 [zfs]
[  982.664482]  dmu_read_impl+0xb8/0x210 [zfs]
[  982.664619]  dmu_read+0x6e/0xc0 [zfs]
[  982.664782]  space_map_load+0x290/0x7b0 [zfs]
[  982.664946]  vdev_dtl_load+0x160/0x260 [zfs]
[  982.665107]  vdev_load+0x83/0x130 [zfs]
[  982.665267]  vdev_load+0x3c/0x130 [zfs]
[  982.665425]  vdev_load+0x3c/0x130 [zfs]
[  982.665585]  spa_load+0x25a2/0x37e0 [zfs]
[  982.665743]  ? spa_activate+0x323/0x8d0 [zfs]
[  982.665764]  ? nvlist_lookup_common+0xde/0x100 [znvpair]
[  982.665920]  spa_tryimport+0x101/0x600 [zfs]
[  982.666086]  zfs_ioc_pool_tryimport+0x7d/0x110 [zfs]
[  982.666253]  zfsdev_ioctl+0x1f4/0x9f0 [zfs]
[  982.666261]  do_vfs_ioctl+0xad/0xa40
[  982.666268]  ? handle_mm_fault+0x15d/0x3b0
[  982.666275]  ? __do_page_fault+0x2b3/0x8f0
[  982.666279]  ksys_ioctl+0xc1/0xd0
[  982.666284]  __x64_sys_ioctl+0x1e/0x30
[  982.666289]  do_syscall_64+0x6f/0x1a0
[  982.666295]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

0.8.0, debug (zfs_dbgmsg_enable=1 zfs_flags=1):

[16110.699237] ZFS: Loaded module v0.8.0-r1-gentoo (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[16122.317547] (!copied) is equivalent to (hdr->b_l1hdr.b_freeze_cksum == NULL)
[16122.317554] PANIC at arc.c:1875:arc_buf_try_copy_decompressed_data()
[16122.317564] Showing stack for process 32474
[16122.317570] CPU: 2 PID: 32474 Comm: z_rd_int Tainted: P           O      4.18.12-gentoo #2
[16122.317572] Hardware name:  /DX79TO, BIOS SIX7910J.86A.0460.2012.0327.1627 03/27/2012
[16122.317574] Call Trace:
[16122.317591]  dump_stack+0x85/0xba
[16122.317617]  spl_dumpstack+0x35/0x40 [spl]
[16122.317633]  spl_panic+0xe6/0x150 [spl]
[16122.317650]  ? spl_kmem_cache_alloc+0xc4/0x1160 [spl]
[16122.317656]  ? __switch_to_asm+0x40/0x70
[16122.317660]  ? _raw_spin_unlock_irq+0x12/0x30
[16122.317675]  ? spl_kmem_cache_alloc+0xc4/0x1160 [spl]
[16122.317680]  ? __slab_alloc+0x26/0x40
[16122.317684]  ? kmem_cache_alloc+0x27c/0x320
[16122.317877]  arc_buf_fill+0x219b/0x21b0 [zfs]
[16122.318033]  arc_buf_alloc_impl+0x3b9/0x950 [zfs]
[16122.318189]  arc_read_done+0x331/0xff0 [zfs]
[16122.318391]  zio_done+0xbc7/0x2980 [zfs]
[16122.318398]  ? _raw_spin_unlock+0x12/0x30
[16122.318591]  ? spa_config_exit+0x100/0x210 [zfs]
[16122.318793]  zio_execute+0x13e/0x3a0 [zfs]
[16122.318813]  taskq_thread+0x3c1/0x790 [spl]
[16122.318820]  ? try_to_wake_up+0x730/0x730
[16122.319030]  ? zio_taskq_member.isra.1.constprop.2+0x90/0x90 [zfs]
[16122.319049]  ? taskq_thread_spawn+0x70/0x70 [spl]
[16122.319055]  kthread+0x16b/0x1a0
[16122.319060]  ? kthread_flush_work+0x180/0x180
[16122.319065]  ret_from_fork+0x35/0x40

[16342.661823] INFO: task zpool:32463 blocked for more than 120 seconds.
[16342.661829]       Tainted: P           O      4.18.12-gentoo #2
[16342.661831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[16342.661835] zpool           D    0 32463   2080 0x00000000
[16342.661842] Call Trace:
[16342.661860]  ? __schedule+0x349/0xd90
[16342.661866]  ? _cond_resched+0x25/0x70
[16342.661871]  schedule+0x48/0xf0
[16342.661877]  schedule_timeout+0xa6/0x640
[16342.661885]  ? collect_expired_timers+0x120/0x120
[16342.661890]  io_schedule_timeout+0x29/0x60
[16342.661913]  __cv_timedwait_common+0x20c/0x3f0 [spl]
[16342.661921]  ? woken_wake_function+0x30/0x30
[16342.661936]  __cv_timedwait_io+0x1d/0x30 [spl]
[16342.662173]  zio_wait+0x27f/0x660 [zfs]
[16342.662334]  dbuf_read+0x1079/0x1980 [zfs]
[16342.662491]  ? dbuf_rele+0x58/0xa0 [zfs]
[16342.662648]  dbuf_hold_impl_arg+0xd62/0xf00 [zfs]
[16342.662806]  dbuf_hold_impl+0x2f/0x60 [zfs]
[16342.662961]  dbuf_hold+0x34/0x70 [zfs]
[16342.663124]  dmu_buf_hold_noread+0xaf/0x1d0 [zfs]
[16342.663286]  dmu_buf_hold+0x51/0xd0 [zfs]
[16342.663478]  space_map_iterate+0x104/0x6c0 [zfs]
[16342.663649]  ? dnode_rele_and_unlock+0x7d/0x190 [zfs]
[16342.663839]  ? spa_stats_destroy+0x320/0x320 [zfs]
[16342.664030]  space_map_load_length+0x84/0x130 [zfs]
[16342.664220]  space_map_load+0x29/0x40 [zfs]
[16342.664411]  vdev_dtl_load+0x18c/0x290 [zfs]
[16342.664604]  ? vdev_obsolete_sm_object+0x28/0xe0 [zfs]
[16342.664794]  vdev_load+0x18e/0x6b0 [zfs]      
[16342.664812]  ? spl_kmem_alloc+0x11c/0x210 [spl]
[16342.665002]  vdev_load+0x4b/0x6b0 [zfs]
[16342.665096]  ? dmu_buf_rele+0x12/0x20 [zfs]
[16342.665395]  ? zap_unlockdir+0x98/0xf0 [zfs]  
[16342.665601]  ? zap_lookup_norm+0xba/0xf0 [zfs]
[16342.665800]  vdev_load+0x4b/0x6b0 [zfs]
[16342.666003]  ? zap_lookup+0x1a/0x30 [zfs]
[16342.666201]  ? spa_dir_prop+0x40/0xa0 [zfs]
[16342.666397]  spa_load+0xbc9/0x2210 [zfs]
[16342.666430]  ? nvt_lookup_name_type.isra.17+0xea/0x2a0 [znvpair]
[16342.666626]  spa_tryimport+0x171/0x760 [zfs]
[16342.666651]  ? spl_kmem_free_impl+0x51/0x60 [spl]
[16342.666858]  zfs_ioc_pool_tryimport+0x7d/0x110 [zfs]
[16342.667067]  zfsdev_ioctl+0x5c9/0xdd0 [zfs]
[16342.667087]  do_vfs_ioctl+0xad/0xa40
[16342.667102]  ? handle_mm_fault+0x15d/0x3b0
[16342.667119]  ? __do_page_fault+0x2b3/0x8f0
[16342.667132]  ksys_ioctl+0xc1/0xd0
[16342.667145]  __x64_sys_ioctl+0x1e/0x30
[16342.667151]  do_syscall_64+0x6f/0x1a0
[16342.667156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
@vthriller
Copy link
Author

After some additional poking through issues I found #6527 and #5721

@vthriller
Copy link
Author

zdb -e -d pile coredumps:

Program terminated with signal SIGSEGV, Segmentation fault.
b#0  dsl_dir_rele (dd=0x3, tag=0x23fe820) at ../../module/zfs/dsl_dir.c:305
305		spa_close(dd->dd_pool->dp_spa, tag);
[Current thread is 1 (Thread 0x7f86bcb24780 (LWP 2156))]
(gdb) bt full
#0  dsl_dir_rele (dd=0x3, tag=0x23fe820) at ../../module/zfs/dsl_dir.c:305
No locals.
#1  0x00007f86bc1d9310 in dsl_pool_close (dp=0x23fe820) at ../../module/zfs/dsl_pool.c:383
No locals.
#2  0x00007f86bc1ff3df in spa_unload (spa=spa@entry=0x23b8050) at ../../module/zfs/spa.c:1521
        i = <optimized out>
#3  0x00007f86bc206bb6 in spa_import (pool=pool@entry=0x7ffeb31860ca "pile", config=0x23b6a70, props=props@entry=0x0, flags=flags@entry=36) at ../../module/zfs/spa.c:5509
        spa = 0x23b8050
        altroot = 0x0
        state = <optimized out>
        policy = {zlp_rewind = 2, zlp_maxmeta = 0, zlp_maxdata = 18446744073709551615, zlp_txg = 18446744073709551615}
        mode = <optimized out>
        readonly = 0
        error = 17
        nvroot = 0x23bf0a8
        spares = 0x100000013
        l2cache = 0x23a8e28
        nspares = 1
        nl2cache = 0
        __func__ = "spa_import"
        __FUNCTION__ = "spa_import"
#4  0x0000000000407337 in main (argc=<optimized out>, argv=0x7ffeb3184e40) at zdb.c:6126
        args = {path = 0x0, paths = 0, poolname = 0x0, guid = 0, cachefile = 0x0, can_be_active = B_TRUE, scan = B_FALSE, policy = 0x0}
        c = <optimized out>
        rl = {rlim_cur = 1024, rlim_max = 1024}
        spa = 0x0
        os = 0x0
        dump_all = <optimized out>
        verbose = 0
        error = 0
        searchdirs = 0x0
        nsearch = <optimized out>
        target = <optimized out>
        target_pool = 0x7ffeb31860ca "pile"
        policy = 0x23a9030
        max_txg = 18446744073709551615
        flags = 4
        rewind = <optimized out>
        spa_config_path_env = <optimized out>
        target_is_spa = B_TRUE
        cfg = 0x23b6a70
        checkpoint_pool = <optimized out>
        checkpoint_target = 0x7f86bcb6c4a0 ""
        __FUNCTION__ = "main"
        __func__ = "main"

added -v and got this:

(gdb) bt full
#0  0x00007f120e227ab8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f120e228e8a in __GI_abort () at abort.c:89
#2  0x00007f120e2672d1 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f120e35a268 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f120e26c8fe in malloc_printerr (action=3, str=0x7f120e35a290 "munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:4958
#4  0x00007f120e7fa51d in umem_free (size=40, ptr=0x1ecd140) at ../../lib/libspl/include/umem.h:131
No locals.
#5  task_free (tq=tq@entry=0x1efc980, t=0x1ecd140) at taskq.c:93
No locals.
#6  0x00007f120e7fac08 in taskq_destroy (tq=0x1efc980) at taskq.c:315
        nthreads = <optimized out>
#7  0x00007f120e84f38f in dsl_pool_close (dp=0x1ef4f10) at ../../module/zfs/dsl_pool.c:400
No locals.
#8  0x00007f120e8753df in spa_unload (spa=spa@entry=0x1e86050) at ../../module/zfs/spa.c:1521
        i = <optimized out>
#9  0x00007f120e87cbb6 in spa_import (pool=pool@entry=0x7ffdb44a60b6 "pile", config=0x1e84a70, props=props@entry=0x0, flags=flags@entry=36) at ../../module/zfs/spa.c:5509
        spa = 0x1e86050
        altroot = 0x0
        state = <optimized out>
        policy = {zlp_rewind = 2, zlp_maxmeta = 0, zlp_maxdata = 18446744073709551615, zlp_txg = 18446744073709551615}
        mode = <optimized out>
        readonly = 0
        error = 17
        nvroot = 0x1e8d0a8
        spares = 0x100000013
        l2cache = 0x1e76e28
        nspares = 1
        nl2cache = 0
        __func__ = "spa_import"
        __FUNCTION__ = "spa_import"
#10 0x0000000000407337 in main (argc=<optimized out>, argv=0x7ffdb44a5970) at zdb.c:6126
        args = {path = 0x0, paths = 0, poolname = 0x0, guid = 0, cachefile = 0x0, can_be_active = B_TRUE, scan = B_FALSE, policy = 0x0}
        c = <optimized out>
        rl = {rlim_cur = 1024, rlim_max = 1024}
        spa = 0x0
        os = 0x0
        dump_all = <optimized out>
        verbose = 1
        error = 0
        searchdirs = 0x0
        nsearch = <optimized out>
        target = <optimized out>
        target_pool = 0x7ffdb44a60b6 "pile"
        policy = 0x1e77030
        max_txg = 18446744073709551615
        flags = 4
        rewind = <optimized out>
        spa_config_path_env = <optimized out>
        target_is_spa = B_TRUE
        cfg = 0x1e84a70
        checkpoint_pool = <optimized out>
        checkpoint_target = 0x7f120f1e24a0 ""
        __FUNCTION__ = "main"
        __func__ = "main"

@vthriller
Copy link
Author

Interesting: not all zdb -ed invocations result in SPA-related SIGSEGV; sometimes it's:

zdb: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

or even:

zdb: can't open 'pile': File exists
+++ exited with 1 +++

and strace -e trace=open (nor plain strace for that matter) gives no clue where that EEXIST comes from.

At the same time it has no problem enumerating datasets from another pool:

# zdb -e -d tank
Dataset mos [META], ID 0, cr_txg 4, 169M, 2566 objects (inconsistent)
Dataset tank/www [ZPL], ID 547, cr_txg 46914409, 5.31G, 26877 objects
...

@vthriller
Copy link
Author

mutex assert traceback:

#3  0x00007f5fba032a62 in __GI___assert_fail (assertion=assertion@entry=0x7f5fba3b413c "mutex->__data.__owner == 0", file=file@entry=0x7f5fba3b4108 "../nptl/pthread_mutex_lock.c", line=line@entry=81, function=function@entry=0x7f5fba3b4240 <__PRETTY_FUNCTION__.8889> "__pthread_mutex_lock") at assert.c:101
No locals.
#4  0x00007f5fba3ab968 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x17f5c90) at ../nptl/pthread_mutex_lock.c:81
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = <optimized out>
        id = <optimized out>
#5  0x00007f5fba60a701 in mutex_enter (mp=mp@entry=0x17f5c90) at kernel.c:207
        __left = <optimized out>
        __FUNCTION__ = "mutex_enter"
#6  0x00007f5fba69cda1 in txg_list_destroy (tl=tl@entry=0x17f5c90) at ../../module/zfs/txg.c:835
No locals.
#7  0x00007f5fba66135f in dsl_pool_close (dp=0x17f5800) at ../../module/zfs/dsl_pool.c:395
No locals.
#8  0x00007f5fba6873df in spa_unload (spa=spa@entry=0x17af020) at ../../module/zfs/spa.c:1521
        i = <optimized out>
#9  0x00007f5fba68ebb6 in spa_import (pool=pool@entry=0x7ffcc123009e "pile", config=0x17ada40, props=props@entry=0x0, flags=flags@entry=36) at ../../module/zfs/spa.c:5509
        spa = 0x17af020
        altroot = 0x0
        state = <optimized out>
        policy = {zlp_rewind = 2, zlp_maxmeta = 0, zlp_maxdata = 18446744073709551615, zlp_txg = 18446744073709551615}
        mode = <optimized out>
        readonly = 0
        error = 17
        nvroot = 0x17b60a8
        spares = 0x100000013
        l2cache = 0x179fdd8
        nspares = 1
        nl2cache = 0
        __func__ = "spa_import"
        __FUNCTION__ = "spa_import"
#10 0x0000000000407337 in main (argc=<optimized out>, argv=0x7ffcc122e3b0) at zdb.c:6126
        args = {path = 0x0, paths = 0, poolname = 0x0, guid = 0, cachefile = 0x0, can_be_active = B_TRUE, scan = B_FALSE, policy = 0x0}
        c = <optimized out>
        rl = {rlim_cur = 1024, rlim_max = 1024}
        spa = 0x0
        os = 0x0
        dump_all = <optimized out>
        verbose = 0
        error = 0
        searchdirs = 0x0
        nsearch = <optimized out>
        target = <optimized out>
        target_pool = 0x7ffcc123009e "pile"
        policy = 0x17a0030
        max_txg = 18446744073709551615
        flags = 4
        rewind = <optimized out>
        spa_config_path_env = <optimized out>
        target_is_spa = B_TRUE
        cfg = 0x17ada40
        checkpoint_pool = <optimized out>
        checkpoint_target = 0x7f5fbaff44a0 ""
        __FUNCTION__ = "main"
        __func__ = "main"

@vthriller
Copy link
Author

Sorry for this noise, but I just remembered to try userland tools with USE=debug:

# zdb -e -d pile
idx + slots <= DNODES_PER_BLOCK (0x66 <= 0x20)
ASSERT at ../../module/zfs/dnode.c:1086:dnode_set_slots()Aborted (core dumped)
(gdb) bt
#0  0x00007f7624bc9ab8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f7624bcae8a in __GI_abort () at abort.c:89
#2  0x00007f76251ef546 in libspl_assertf (file=0x7f7625377b00 "../../module/zfs/dnode.c", func=func@entry=0x7f7625379090 <__FUNCTION__.17719> "dnode_set_slots", line=line@entry=1086, format=format@entry=0x7f7625367b2a "%s %s %s (0x%llx %s 0x%llx)", file=0x7f7625377b00 "../../module/zfs/dnode.c")
    at ../../lib/libspl/include/assert.h:55
#3  0x00007f76251f0ed3 in dnode_set_slots (children=children@entry=0x2888750, idx=idx@entry=1, slots=slots@entry=101, ptr=ptr@entry=0x3) at ../../module/zfs/dnode.c:1086
#4  0x00007f76251f244d in dnode_hold_impl (os=0x2869700, object=123, flag=flag@entry=1, slots=slots@entry=0, tag=tag@entry=0x7f7625373950 <__func__.18200>, dnp=dnp@entry=0x7ffd0b7bffa8) at ../../module/zfs/dnode.c:1385
#5  0x00007f76251f2b77 in dnode_hold (os=<optimized out>, object=<optimized out>, tag=tag@entry=0x7f7625373950 <__func__.18200>, dnp=dnp@entry=0x7ffd0b7bffa8) at ../../module/zfs/dnode.c:1555
#6  0x00007f76251d4c08 in dmu_buf_hold_noread (os=<optimized out>, object=<optimized out>, offset=0, tag=0x7f762539f820 <__func__.17155>, dbp=0x7ffd0b7c0038) at ../../module/zfs/dmu.c:183
#7  0x00007f76251d4d77 in dmu_buf_hold (os=<optimized out>, object=<optimized out>, offset=offset@entry=0, tag=tag@entry=0x7f762539f820 <__func__.17155>, dbp=dbp@entry=0x7ffd0b7c0038, flags=flags@entry=1) at ../../module/zfs/dmu.c:238
#8  0x00007f76252eae2f in zap_lockdir (os=<optimized out>, obj=<optimized out>, tx=tx@entry=0x0, lti=lti@entry=0, fatreader=fatreader@entry=B_TRUE, adding=adding@entry=B_FALSE, tag=0x7f762539f820 <__func__.17155>, zapp=0x7ffd0b7c00f8) at ../../module/zfs/zap_micro.c:614
#9  0x00007f76252eb78c in zap_lookup_norm (os=<optimized out>, zapobj=<optimized out>, name=name@entry=0x7f7625392648 "org.zfsonlinux:allocation_bias", integer_size=integer_size@entry=1, num_integers=num_integers@entry=64, buf=buf@entry=0x7ffd0b7c0180, mt=(unknown: 0), realname=0x0, rn_len=0, ncp=0x0)
    at ../../module/zfs/zap_micro.c:1015
#10 0x00007f76252eb7f1 in zap_lookup (os=<optimized out>, zapobj=<optimized out>, name=name@entry=0x7f7625392648 "org.zfsonlinux:allocation_bias", integer_size=integer_size@entry=1, num_integers=num_integers@entry=64, buf=buf@entry=0x7ffd0b7c0180) at ../../module/zfs/zap_micro.c:963
#11 0x00007f762526432a in vdev_load (vd=0x287a280) at ../../module/zfs/vdev.c:3031
#12 0x00007f762526411c in vdev_load (vd=vd@entry=0x2876620) at ../../module/zfs/vdev.c:3016
#13 0x00007f7625248724 in spa_ld_load_vdev_metadata (spa=0x27fc500) at ../../module/zfs/spa.c:3728
#14 spa_load_impl (ereport=<synthetic pointer>, type=SPA_IMPORT_EXISTING, spa=0x27fc500) at ../../module/zfs/spa.c:4232
#15 spa_load (spa=spa@entry=0x27fc500, state=state@entry=SPA_LOAD_IMPORT, type=type@entry=SPA_IMPORT_EXISTING) at ../../module/zfs/spa.c:2383
#16 0x00007f7625249752 in spa_load_best (spa=spa@entry=0x27fc500, state=SPA_LOAD_IMPORT, max_request=<optimized out>, rewind_flags=2) at ../../module/zfs/spa.c:4417
#17 0x00007f762524b593 in spa_import (pool=pool@entry=0x7ffd0b7c20a4 "pile", config=0x27faf20, props=props@entry=0x0, flags=flags@entry=36) at ../../module/zfs/spa.c:5475
#18 0x0000000000407337 in main (argc=<optimized out>, argv=0x7ffd0b7c0770) at zdb.c:6126

@h1z1
Copy link

h1z1 commented Jun 10, 2019

Is ARC compression enabled?

cat /sys/module/zfs/parameters/zfs_compressed_arc_enabled

@vthriller
Copy link
Author

vthriller commented Jun 10, 2019

$ cat /sys/module/zfs/parameters/zfs_compressed_arc_enabled
1

I'm not sure how ARC might affect zdb -e though…

@behlendorf
Copy link
Contributor

@vthriller there is a proposed fix for the (!copied) is equivalent to (hdr->b_l1hdr.b_freeze_cksum == NULL) failure in #8736. If possible, as a first step I'd suggest applying the fix to a debug build. It should improve the situation, although based on stacks you posted it sounds like there may be an additional issue to investigate.

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jun 10, 2019
@vthriller
Copy link
Author

After applying #8736 zfs crashes pretty much exactly like zdb -e:

[   68.030541] VERIFY3(idx + slots <= DNODES_PER_BLOCK) failed (102 <= 32)
[   68.031163] PANIC at dnode.c:1086:dnode_set_slots()
[   68.031763] Showing stack for process 2030
[   68.031770] CPU: 2 PID: 2030 Comm: zpool Tainted: P           O      4.18.12-gentoo #2
[   68.031772] Hardware name:  /DX79TO, BIOS SIX7910J.86A.0460.2012.0327.1627 03/27/2012
[   68.031774] Call Trace:
[   68.031790]  dump_stack+0x85/0xba
[   68.031816]  spl_dumpstack+0x35/0x40 [spl]
[   68.031831]  spl_panic+0xe6/0x150 [spl]
[   68.031838]  ? kmem_cache_free+0x27f/0x2e0
[   68.031853]  ? spl_kmem_cache_free+0x217/0x3a0 [spl]
[   68.032094]  ? zio_destroy+0xe0/0xf0 [zfs]
[   68.032296]  ? zio_wait+0x2fd/0x660 [zfs]
[   68.032302]  ? _cond_resched+0x25/0x70
[   68.032473]  dnode_set_slots+0xb7/0xc0 [zfs]
[   68.032646]  dnode_hold_impl+0xe24/0x1620 [zfs]
[   68.032817]  dnode_hold+0x1f/0x30 [zfs]
[   68.032980]  dmu_buf_hold_noread+0x3f/0x1d0 [zfs]
[   68.033174]  ? spa_stats_destroy+0x320/0x320 [zfs]
[   68.033336]  dmu_buf_hold+0x51/0xd0 [zfs]
[   68.033535]  zap_lockdir+0x5c/0x170 [zfs]
[   68.033729]  ? space_map_load_length+0x84/0x130 [zfs]
[   68.033845]  zap_lookup_norm+0x65/0xf0 [zfs]
[   68.033845]  zap_lookup+0x1a/0x30 [zfs]
[   68.033845]  vdev_load+0x351/0x6b0 [zfs]
[   68.033845]  ? dmu_buf_rele+0x12/0x20 [zfs]
[   68.033845]  ? zap_unlockdir+0x98/0xf0 [zfs]
[   68.033845]  ? zap_lookup_norm+0xba/0xf0 [zfs]
[   68.033845]  vdev_load+0x4b/0x6b0 [zfs]
[   68.033845]  ? zap_lookup+0x1a/0x30 [zfs]
[   68.033845]  ? spa_dir_prop+0x40/0xa0 [zfs]
[   68.033845]  spa_load+0xbc9/0x2210 [zfs]
[   68.033845]  ? nvt_lookup_name_type.isra.17+0xea/0x2a0 [znvpair]
[   68.033845]  spa_tryimport+0x171/0x760 [zfs]
[   68.033845]  ? spl_kmem_free_impl+0x51/0x60 [spl]
[   68.033845]  zfs_ioc_pool_tryimport+0x7d/0x110 [zfs]
[   68.033845]  zfsdev_ioctl+0x5c9/0xdd0 [zfs]
[   68.033845]  do_vfs_ioctl+0xad/0xa40
[   68.033845]  ? handle_mm_fault+0x15d/0x3b0
[   68.033845]  ? __do_page_fault+0x2b3/0x8f0
[   68.033845]  ksys_ioctl+0xc1/0xd0
[   68.033845]  __x64_sys_ioctl+0x1e/0x30
[   68.033845]  do_syscall_64+0x6f/0x1a0
[   68.033845]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

@behlendorf
Copy link
Contributor

@vthriller thanks for the gdb output, it shows that that slots value being passed to dnode_set_slots is what's out of bounds and causing the panic. It must somehow be wrong on disk, yet have a valid checksum (which is what should protect against this). There are at most 32 slots currently for a block of dnodes (far less than the 101 reported).

#3  0x00007f76251f0ed3 in dnode_set_slots (children=children@entry=0x2888750, idx=idx@entry=1, slots=slots@entry=101, ptr=ptr@entry=0x3) at ../../module/zfs/dnode.c:1086

My suggestion would be to first attempt to import the pool read-only from the previous TXG number, using the -T <txg> option. This should work assuming that the damaged block was written in the last TXG. If that doesn't work, then please let me know and I'll see about putting together a patch to convert the panic in to an IO error which may allow the import to proceed.

@vthriller
Copy link
Author

@behlendorf is it ok if label txg is smaller than those recorded in the uberblocks?

# for i in /dev/sd[cg]1; do zdb -llluuu $i; done | egrep '\Wname|Uber|\Wtxg' | sed '/Uber/ {N; s/\n//}' | sort -nk2 -t'[' | uniq -c
      8     name: 'pile'
      8     txg: 17457116
      8     Uberblock[0]	txg = 20101632
      8     Uberblock[1]	txg = 20101505
      8     Uberblock[2]	txg = 20101506
      8     Uberblock[3]	txg = 20101667
      8     Uberblock[4]	txg = 20086532
      8     Uberblock[5]	txg = 20086533
      8     Uberblock[6]	txg = 20086534
      8     Uberblock[7]	txg = 20101191
      8     Uberblock[8]	txg = 20101384
      8     Uberblock[9]	txg = 20101385
      8     Uberblock[10]	txg = 20100746
      8     Uberblock[11]	txg = 20100747
      8     Uberblock[12]	txg = 20100780
      8     Uberblock[13]	txg = 20101549
      8     Uberblock[14]	txg = 20101006
      8     Uberblock[15]	txg = 20101583
      8     Uberblock[16]	txg = 20101584
      8     Uberblock[17]	txg = 20069265
      8     Uberblock[18]	txg = 20101426
      8     Uberblock[19]	txg = 20100947
      8     Uberblock[20]	txg = 20069268
      8     Uberblock[21]	txg = 20069269
      8     Uberblock[22]	txg = 20069270
      8     Uberblock[23]	txg = 20068919
      8     Uberblock[24]	txg = 20069272
      8     Uberblock[25]	txg = 20101497
      8     Uberblock[26]	txg = 20100890
      8     Uberblock[27]	txg = 20101531
      8     Uberblock[28]	txg = 20069308
      8     Uberblock[29]	txg = 20101501
      8     Uberblock[30]	txg = 20101470
      8     Uberblock[31]	txg = 20100927

As for importing, I'd rather stick to zdb for now to minimize number of reboots (and skip qemu vm setup) and to have better stacktraces with gdb. Will report shortly.

@vthriller
Copy link
Author

Tried every txg from the list, the result is either this:

# zdb -e pile -t 20101506 # ub 2
…
idx + slots <= DNODES_PER_BLOCK (0x66 <= 0x20)
ASSERT at ../../module/zfs/dnode.c:1086:dnode_set_slots()Aborted (core dumped)

or like this:

# zdb -e pile -t 20101505 # ub 1
…
zdb: can't open 'pile': Input/output error

ZFS_DBGMSG(zdb) START:
spa.c:5470:spa_import(): spa_import: importing pile
spa_misc.c:408:spa_load_note(): spa_load(pile, config trusted): LOADING
vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/ata-TOSHIBA_MG03ACA400_158EK6CPF-part1': best uberblock found for spa pile. txg 20101505
spa_misc.c:408:spa_load_note(): spa_load(pile, config untrusted): using uberblock with txg=20101505
spa_misc.c:393:spa_load_failed(): spa_load(pile, config untrusted): FAILED: unable to open rootbp in dsl_pool_init [error=5]
spa_misc.c:408:spa_load_note(): spa_load(pile, config untrusted): UNLOADING
ZFS_DBGMSG(zdb) END

@behlendorf
Copy link
Contributor

Sure zdb effectively does a user space import so it's a great choice to debug this. Running zdb -u on the exported pool should give you the largest txg where it will start the import. That should match the largest txg from the uber blocks. The smaller txg from the label is fine, it indicates the last time the configuration was changed.

@vthriller
Copy link
Author

Running zdb -u on the exported pool

Quick reminder: the pool in question was not properly exported due to kernel oopses deep in the SPL code.

Besides, zdb -u fails just like regular zdb

# zdb -eu pile
idx + slots <= DNODES_PER_BLOCK (0x66 <= 0x20)
ASSERT at ../../module/zfs/dnode.c:1086:dnode_set_slots()Aborted (core dumped)

@richardelling
Copy link
Contributor

Consider trying zdb's -A option

@behlendorf
Copy link
Contributor

behlendorf commented Jun 10, 2019

Can you try the following patch with zdb. It should detect your damaged dnode block and convert it to an IO error. This will allow zfs to try the other copies which may be intact, or failing that allow the rollback mechanism to try earlier TXGs.

diff --git a/module/zfs/dnode.c b/module/zfs/dnode.c
index c06f614..f5bd10d 100644
--- a/module/zfs/dnode.c
+++ b/module/zfs/dnode.c
@@ -1381,6 +1381,16 @@ dnode_hold_impl(objset_t *os, uint64_t object, int flag, int slo
                        if (dn_block[i].dn_type != DMU_OT_NONE) {
                                int interior = dn_block[i].dn_extra_slots;
 
+                               if (i + interior >= DNODES_PER_BLOCK) {
+                                       for (int j = 0; j < i; j++)
+                                               zrl_destroy(&dnh[j].dnh_zrlock);
+
+                                       kmem_free(dnc, sizeof (dnode_children_t) +
+                                           epb * sizeof (dnode_handle_t));
+                                       dbuf_rele(db, FTAG);
+
+                                       return (SET_ERROR(ECKSUM));
+                               }
+
                                dnode_set_slots(dnc, i, 1, DN_SLOT_ALLOCATED);
                                dnode_set_slots(dnc, i + 1, interior,
                                    DN_SLOT_INTERIOR);

@vthriller
Copy link
Author

Consider trying zdb's -A option

Neither -A, -AA, nor -AAA have absolutely no effect.

@vthriller
Copy link
Author

@behlendorf this is weird: with this patch zdb does… nothing. It shows import configuration, then just goes into S state and consumes negligible amounts of CPU time. This is what I see after I attach with gdb:

(gdb) bt
#0  pthread_cond_wait () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f8682b02e49 in cv_wait (cv=cv@entry=0x135d298, mp=mp@entry=0x135d1f0) at kernel.c:338
#2  0x00007f8682b04dab in taskq_wait (tq=0x135d1d0) at taskq.c:193
#3  0x00007f8682b35891 in dmu_buf_user_evict_wait () at ../../module/zfs/dbuf.c:3616
#4  0x00007f8682b7a9d2 in dsl_pool_close (dp=0x13cb000) at ../../module/zfs/dsl_pool.c:414
#5  0x00007f8682ba94e6 in spa_unload (spa=spa@entry=0x1362e30) at ../../module/zfs/spa.c:1521
#6  0x00007f8682bb1aa6 in spa_import (pool=pool@entry=0x7ffc96d3c016 "pile", config=0x13618b0, props=props@entry=0x0, flags=flags@entry=36) at ../../module/zfs/spa.c:5509
#7  0x0000000000407394 in main (argc=<optimized out>, argv=0x7ffc96d3a9a8) at zdb.c:6126

(I'm not showing thread apply all bt here since there are 610 (!) mostly idling threads.)

@behlendorf
Copy link
Contributor

Whoops, it looks like I forgot to release a hold. I've updated the patch above accordingly (note the additional dbuf_rele() line).

@vthriller
Copy link
Author

Thank you.

Previously asserting txg:

# ./cmd/zdb/zdb -e pile 
…
zdb: can't open 'pile': Invalid exchange

ZFS_DBGMSG(zdb) START:
spa.c:5470:spa_import(): spa_import: importing pile
spa_misc.c:408:spa_load_note(): spa_load(pile, config trusted): LOADING
vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/ata-TOSHIBA_MG03ACA400_158EK6CPF-part1': best uberblock found for spa pile. txg 20101667
spa_misc.c:408:spa_load_note(): spa_load(pile, config untrusted): using uberblock with txg=20101667
vdev.c:129:vdev_dbgmsg(): mirror-0 vdev (guid 5098405799333023431): metaslab_init failed [error=52]
vdev.c:129:vdev_dbgmsg(): mirror-0 vdev (guid 5098405799333023431): vdev_load: metaslab_init failed [error=52]
spa_misc.c:393:spa_load_failed(): spa_load(pile, config trusted): FAILED: vdev_load failed [error=52]
spa_misc.c:408:spa_load_note(): spa_load(pile, config trusted): UNLOADING
ZFS_DBGMSG(zdb) END

Output for -t 20101505 did not change.

@behlendorf
Copy link
Contributor

You can pass -F to zdb to attempt rewinding the pool. If that also fails I'd suggest trying to import the pool read-only. This may succeed since the damaged dnode block was for the melaslab object which doesn't need to be accessed for a read-only pool. Assuming that works we'll know the damage is limited and may be able to manually fix up the damaged dnode block.

@vthriller
Copy link
Author

This is interesting. I rebooted into the system with both patches (#8736 and the one above) applied, and initial zpool import listed all three pools, one of which is, of course, FAULTED. But before trying zdb -F or zpool -F I decided to load zfs with zfs_dbgmsg_enable=1 zfs_flags=1, and I got unexpected panic after another attempt at zpool import:

[  776.713700] ZFS: Loaded module v0.8.0-r1-gentoo (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[  780.792502] VERIFY3(hdr->b_l1hdr.b_freeze_cksum != NULL) failed (0000000000000000!= 0000000000000000)
[  780.792509] PANIC at arc.c:2261:arc_buf_fill()
[  780.792512] Showing stack for process 3395
[  780.792516] CPU: 1 PID: 3395 Comm: z_rd_int Tainted: P           O      4.18.12-gentoo #2
[  780.792517] Hardware name:  /DX79TO, BIOS SIX7910J.86A.0460.2012.0327.1627 03/27/2012
[  780.792519] Call Trace:
[  780.792532]  dump_stack+0x85/0xba
[  780.792552]  spl_dumpstack+0x35/0x40 [spl]
[  780.792562]  spl_panic+0xe6/0x150 [spl]
[  780.792702]  ? abd_iterate_func+0x18b/0x4b0 [zfs]
[  780.792714]  ? spl_kmem_cache_alloc+0xc4/0x1160 [spl]
[  780.792718]  ? __slab_alloc+0x26/0x40
[  780.792721]  ? kmem_cache_alloc+0x27c/0x320
[  780.792824]  arc_buf_fill+0xb10/0x21c0 [zfs]
[  780.792928]  arc_buf_alloc_impl+0x3b9/0x950 [zfs]
[  780.793032]  arc_read_done+0x331/0xff0 [zfs]
[  780.793167]  zio_done+0xbc7/0x2980 [zfs]
[  780.793172]  ? kfree+0x246/0x290
[  780.793174]  ? kfree+0x246/0x290
[  780.793308]  zio_execute+0x13e/0x3a0 [zfs]
[  780.793321]  taskq_thread+0x3c1/0x790 [spl]
[  780.793326]  ? try_to_wake_up+0x730/0x730
[  780.793460]  ? zio_taskq_member.isra.1.constprop.2+0x90/0x90 [zfs]
[  780.793473]  ? taskq_thread_spawn+0x70/0x70 [spl]
[  780.793477]  kthread+0x16b/0x1a0
[  780.793480]  ? kthread_flush_work+0x180/0x180
[  780.793484]  ret_from_fork+0x35/0x40

I get this trace both after rmmod/modprobe cycle (twice, each time following clean zfs load with working zpool import), as well as after booting straight with said options.

@vthriller
Copy link
Author

PANIC at arc.c:2261:arc_buf_fill()

This corresponds to https://github.com/zfsonlinux/zfs/blob/zfs-0.8.0/module/zfs/arc.c#L2256 in tag zfs-0.8.0

@vthriller
Copy link
Author

vthriller commented Jun 11, 2019

Anyways, I went ahead and tried zpool import -o readonly=on -T $txg pile (-F implied by -T; not sure if that combination with readonly is even expected to work), and that didn't work, which is somewhat expected since I already tried txgs from all uberblocks manually with zdb with nothing fruitful in the end. Same goes for simple zpool -Fn pile, which just silently exited with code 1.

@kusumi
Copy link
Member

kusumi commented Jun 28, 2019

I always see this with ZoL master on Fedora (5.1.11-200.fc29.x86_64) when echo 1 > /sys/module/zfs/parameters/zfs_flags is set, but never when using a default value.

[12997.652196] (!copied) is equivalent to (hdr->b_l1hdr.b_freeze_cksum == NULL)
[12997.652198] PANIC at arc.c:1875:arc_buf_try_copy_decompressed_data()

@kusumi
Copy link
Member

kusumi commented Jul 3, 2019

I always see this with ZoL master on Fedora (5.1.11-200.fc29.x86_64) when echo 1 > /sys/module/zfs/parameters/zfs_flags is set, but never when using a default value.

I can also easily reproduce this with upstream 5.2.0-rc7.

@behlendorf
Copy link
Contributor

PANIC at arc.c:1875:arc_buf_try_copy_decompressed_data() was resolved in master by commit 46db9d6, but the original PANIC reported has not be addressed.

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants