Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS Test Suite #6

Closed
behlendorf opened this issue May 14, 2010 · 3 comments
Closed

ZFS Test Suite #6

behlendorf opened this issue May 14, 2010 · 3 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case Type: Feature Feature request or new feature
Milestone

Comments

@behlendorf
Copy link
Contributor

While I have tried to provide some ZFS test suite coverage as I develop it is still sparse. Sure I check that ztest runs in user space without error. And I do verify that you can successfully create zpools on top of various types of block devices (disk, loopback, md, ram, etc). And I do make an effort to push some real data through the system using zpios. But that's not enough. Happily it looks like much of the work for a solution is out there already. The community has provided their own test suite which looks pretty darn good. It's going to be a fair chunk of work to integrate this with Linux but it's something which needs to be done. Additionally of course the tests which we expect to fail because that functionality isn't implemented yet should be disabled for now.

http://hub.opensolaris.org/bin/view/Community+Group+zfs/zfstestsuite

@behlendorf
Copy link
Contributor Author

The KQ guys have started a nice test project here:
https://github.com/zfs-linux/test

@timbrody timbrody mentioned this issue May 30, 2012
dechamps added a commit to dechamps/zfs that referenced this issue Sep 25, 2012
trim_map_segment_add() can now be called from the ARC when buffers are
released from the L2ARC, so protect against direct reclaim.

This fixes the following issue:

	SPLError: 22911:0:(kmem.h:90:sanitize_flags()) FATAL allocation for task txg_sync (22911) which used GFP flags 0x5345943c with PF_NOFS set
	SPLError: 22911:0:(kmem.h:90:sanitize_flags()) SPL PANIC
	SPL: Showing stack for process 22911
	Pid: 22911, comm: txg_sync Tainted: P           O 3.2.28-zfsdev-std-ipv6-64 openzfs#6
	Call Trace:
	 [<ffffffffa0294df7>] spl_debug_dumpstack+0x27/0x40 [spl]
	 [<ffffffffa02962cc>] spl_debug_bug+0x7c/0xe0 [spl]
	 [<ffffffffa029dabb>] kmem_alloc_debug+0x51b/0x530 [spl]
	 [<ffffffffa040dfb1>] trim_map_segment_add+0x211/0x350 [zfs]
	 [<ffffffffa040e1e9>] trim_map_free_locked+0xf9/0x150 [zfs]
	 [<ffffffffa040e2dd>] trim_map_free+0x9d/0x130 [zfs]
	 [<ffffffffa0380f9f>] arc_release+0x3cf/0x940 [zfs]
	 [<ffffffffa03917f8>] dbuf_dirty+0xdc8/0x1980 [zfs]
	 [<ffffffffa0393192>] dmu_buf_will_dirty+0x102/0x1c0 [zfs]
	 [<ffffffffa039c938>] dmu_write+0x98/0x260 [zfs]
	 [<ffffffffa0405295>] spa_history_write+0x175/0x200 [zfs]
	 [<ffffffffa0405d53>] spa_history_log_sync+0x313/0x960 [zfs]
	 [<ffffffffa03e3c29>] dsl_sync_task_group_sync+0x139/0x380 [zfs]
	 [<ffffffffa03d94fd>] dsl_pool_sync+0x2ed/0x860 [zfs]
	 [<ffffffffa03f50f7>] spa_sync+0x3b7/0xc50 [zfs]
	 [<ffffffffa04103fb>] txg_sync_thread+0x2cb/0x560 [zfs]
	 [<ffffffffa029f051>] thread_generic_wrapper+0x81/0xe0 [spl]
	 [<ffffffff810c1e56>] kthread+0x96/0xa0
	 [<ffffffff81c20d74>] kernel_thread_helper+0x4/0x10
@wizeman
Copy link

wizeman commented Nov 22, 2012

FreeBSD has a limited set of regression tests, although it seems it hasn't been updated in the last 4 years (assuming I'm looking at the latest code):

http://svnweb.freebsd.org/base/head/tools/regression/zfs/

At one point I ported these tests to Solaris and Linux, so that they worked with the Lustre version of the userspace DMU.

I submitted back my portability changes to FreeBSD, which apparently were committed, so the scripts should still work fine on Linux.

It's pretty incomplete (it only tests zpool comands), but if porting the Solaris-based ZFS test suite mentioned in this bug is too much work, maybe this can be a starting point for having and adding more regression tests.

@behlendorf
Copy link
Contributor Author

@wizeman Nice! I didn't know where to find these FreeBSD tests. I was able to download the suite from http://svn0.us-west.freebsd.org/base/head/tools/regression/zfs and with one minor fix I was able to run them under Linux. I haven't looked in to any of the failures yet to see how many are legitimate. But this at least gives us a battery of 2850 tests to start with. Here are the initial results from 0.6.0-rc12. Many of the failures seem to be due to slightly different white space formatting.

$ sudo prove zpool/*
zpool/add/cache.t ..................................... Failed 5/33 subtests 
zpool/add/disks.t ..................................... Failed 2/19 subtests 
zpool/add/doesnt_exist.t .............................. ok   
zpool/add/files.t ..................................... Failed 9/54 subtests 
zpool/add/log.t ....................................... Failed 12/66 subtests 
zpool/add/mirror.t .................................... Failed 3/15 subtests 
zpool/add/option-f_inuse.t ............................ Failed 53/263 subtests 
zpool/add/option-f_replication_level_mismatch_0.t ..... Failed 7/40 subtests 
zpool/add/option-f_replication_level_mismatch_1.t ..... Failed 34/182 subtests 
zpool/add/option-f_size_mismatch.t .................... Failed 19/100 subtests 
zpool/add/option-f_type_mismatch.t .................... Failed 20/100 subtests 
zpool/add/option-n.t .................................. Failed 1/5 subtests 
zpool/add/raidz1.t .................................... Failed 2/10 subtests 
zpool/add/raidz2.t .................................... Failed 2/10 subtests 
zpool/add/spare.t ..................................... Failed 6/31 subtests 
zpool/attach/log.t .................................... Failed 12/34 subtests 
zpool/attach/mirror.t ................................. Failed 12/34 subtests 
zpool/attach/option-f_inuse.t ......................... Failed 28/141 subtests 
zpool/create/already_exists.t ......................... Failed 2/5 subtests 
zpool/create/automount.t .............................. ok   
zpool/create/cache.t .................................. Failed 5/35 subtests 
zpool/create/disks.t .................................. Failed 2/14 subtests 
zpool/create/files.t .................................. Failed 8/59 subtests 
zpool/create/log.t .................................... Failed 8/56 subtests 
zpool/create/mirror.t ................................. Failed 3/22 subtests 
zpool/create/option-R.t ............................... ok   
zpool/create/option-f_inuse.t ......................... ok       
zpool/create/option-f_replication_level_mismatch_0.t .. Failed 7/70 subtests 
zpool/create/option-f_replication_level_mismatch_1.t .. Failed 18/180 subtests 
zpool/create/option-f_size_mismatch.t ................. Failed 17/104 subtests 
zpool/create/option-f_type_mismatch.t ................. Failed 34/160 subtests 
zpool/create/option-m.t ............................... Failed 3/28 subtests 
zpool/create/option-n.t ............................... ok   
zpool/create/option-o.t ............................... ok     
zpool/create/raidz1.t ................................. Failed 5/37 subtests 
zpool/create/raidz2.t ................................. Failed 3/23 subtests 
zpool/create/spare.t .................................. Failed 4/28 subtests 
zpool/offline/io.t .................................... Failed 4/31 subtests 
        (2 TODO tests unexpectedly succeeded)
zpool/offline/log.t ................................... Failed 11/67 subtests 
zpool/offline/mirror.t ................................ Failed 5/47 subtests 
        (4 TODO tests unexpectedly succeeded)
zpool/offline/option-t.t .............................. Failed 40/219 subtests 
        (6 TODO tests unexpectedly succeeded)
zpool/offline/raidz1.t ................................ Failed 6/35 subtests 
zpool/offline/raidz2.t ................................ Failed 4/33 subtests 
        (3 TODO tests unexpectedly succeeded)
zpool/remove/cache.t .................................. Failed 3/9 subtests 
zpool/remove/spare.t .................................. Failed 6/18 subtests 
zpool/replace/cache.t ................................. Failed 2/6 subtests 
zpool/replace/disk.t .................................. Failed 2/10 subtests 
zpool/replace/log.t ................................... Failed 5/27 subtests 
zpool/replace/mirror.t ................................ Failed 6/27 subtests 
zpool/replace/raidz1.t ................................ Failed 6/27 subtests 
zpool/replace/raidz2.t ................................ Failed 24/115 subtests 
        (1 TODO test unexpectedly succeeded)
zpool/replace/spare.t ................................. Failed 2/6 subtests 

FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Mar 30, 2013
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Mar 1, 2015
Check at ./configure time that the kernel was built with kallsyms
support.  If the kernel doesn't have CONFIG_KALLSYMS defined the
modules will still compile cleanly but will not be loadable.  So
we really want to catch this early during ./configure.  Note that
we do not require CONFIG_KALLSYMS_ALL but it may be safely defined.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#6
jkryl referenced this issue in mayadata-io/cstor Feb 27, 2018
[CCS-68] travis integration with ZoL
richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018
[CCS-68] travis integration with ZoL
richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018
- API to get condensed tree of modified blocks between two txg
Signed-off-by: mayank <[email protected]>
jdike added a commit to jdike/zfs that referenced this issue Aug 1, 2019
This patch is an RFC.  There are at least two things it could be
getting wrong:
    1 - The use of a mutex to protect an increment; I couldn't see how
        to get atomic operations in this code
    2 - There could easily be a better fix than putting each newly
        allocated sa_os_t in its own lockdep class

This fix does eliminate the lockdep warnings, so there's that.

The two lockdep reports are below.  They show two different deadlock
scenarios, but they share a common link, which is
    thread 1 holding sa_lock and trying to get zap->zap_rwlock:
       zap_lockdir_impl+0x858/0x16c0 [zfs]
       zap_lockdir+0xd2/0x100 [zfs]
       zap_lookup_norm+0x7f/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       sa_setup+0x902/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]

    thread 2 trying to get sa_lock, either in sa_setup:
       sa_setup+0x742/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]
    or in sa_build_index:
       sa_build_index+0x13d/0x790 [zfs]
       sa_handle_get_from_db+0x368/0x500 [zfs]
       zfs_znode_sa_init.isra.0+0x24b/0x330 [zfs]
       zfs_znode_alloc+0x3da/0x1a40 [zfs]
       zfs_zget+0x39a/0x6e0 [zfs]
       zfs_root+0x101/0x160 [zfs]
       zfs_domount+0x91f/0xea0 [zfs]

AFAICT, sa_os_t is unique to its zfsvfs, so if we have two stacks
calling zfs_domount, each has a different zfsvfs and thus a different
sa, and there is no real deadlock here.

The sa_setup vs sa_setup case is easy, since each is referring to a
newly allocated sa_os_t.

In the sa_build_index vs sa_setup case, we need to reason that the
sa_os_t is unique to a zfsvfs.

======================================================
WARNING: possible circular locking dependency detected
4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1 Tainted: G        W  O
------------------------------------------------------
kswapd0/716 is trying to acquire lock:
00000000ac111d4a (&zfsvfs->z_teardown_inactive_lock){.+.+}, at: zfs_inactive+0x132/0xb40 [zfs]

but task is already holding lock:
00000000218b764d (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> openzfs#4 (fs_reclaim){+.+.}:
       kmem_cache_alloc_node_trace+0x43/0x380
       __kmalloc_node+0x3c/0x60
       spl_kmem_alloc+0xd9/0x1f0 [spl]
       zap_name_alloc+0x34/0x480 [zfs]
       zap_lookup_impl+0x27/0x3a0 [zfs]
       zap_lookup_norm+0xb9/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       dsl_dir_hold+0x341/0x660 [zfs]
       dsl_dataset_hold+0xb6/0x6c0 [zfs]
       dmu_objset_hold+0xca/0x120 [zfs]
       zpl_mount+0x90/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#3 (&zap->zap_rwlock){++++}:
       zap_lockdir_impl+0x7ed/0x15c0 [zfs]
       zap_lockdir+0xd2/0x100 [zfs]
       zap_lookup_norm+0x7f/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       sa_setup+0x902/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#2 (&sa->sa_lock){+.+.}:
       sa_setup+0x742/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#1 (&os->os_user_ptr_lock){+.+.}:
       zfs_get_vfs_flag_unmounted+0x63/0x3c0 [zfs]
       dmu_free_long_range+0x963/0xda0 [zfs]
       zfs_rmnode+0x719/0x9c0 [zfs]
       zfs_inactive+0x306/0xb40 [zfs]
       zpl_evict_inode+0xa7/0x140 [zfs]
       evict+0x212/0x570
       do_unlinkat+0x2e6/0x540
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&zfsvfs->z_teardown_inactive_lock){.+.+}:
       down_read+0x3f/0xe0
       zfs_inactive+0x132/0xb40 [zfs]
       zpl_evict_inode+0xa7/0x140 [zfs]
       evict+0x212/0x570
       dispose_list+0xfa/0x1d0
       prune_icache_sb+0xd3/0x140
       super_cache_scan+0x292/0x440
       do_shrink_slab+0x2b9/0x800
       shrink_slab+0x195/0x410
       shrink_node+0x2e1/0x10f0
       kswapd+0x71c/0x11c0
       kthread+0x2e7/0x3e0
       ret_from_fork+0x3a/0x50

other info that might help us debug this:

Chain exists of:
  &zfsvfs->z_teardown_inactive_lock --> &zap->zap_rwlock --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(&zap->zap_rwlock);
                               lock(fs_reclaim);
  lock(&zfsvfs->z_teardown_inactive_lock);

 *** DEADLOCK ***

3 locks held by kswapd0/716:
 #0: 00000000218b764d (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 openzfs#1: 00000000f9a6bfa1 (shrinker_rwsem){++++}, at: shrink_slab+0x109/0x410
 openzfs#2: 0000000076154958 (&type->s_umount_key#50){.+.+}, at: trylock_super+0x16/0xc0

stack backtrace:
CPU: 5 PID: 716 Comm: kswapd0 Tainted: G        W  O      4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1
Hardware name: Dell Inc. PowerEdge R510/0W844P, BIOS 1.1.4 11/04/2009
Call Trace:
 dump_stack+0x91/0xeb
 print_circular_bug.isra.16+0x30b/0x5b0
 ? save_trace+0xd6/0x240
 __lock_acquire+0x41be/0x4f10
 ? debug_show_all_locks+0x2d0/0x2d0
 ? sched_clock_cpu+0x133/0x170
 ? lock_acquire+0x153/0x330
 lock_acquire+0x153/0x330
 ? zfs_inactive+0x132/0xb40 [zfs]
 down_read+0x3f/0xe0
 ? zfs_inactive+0x132/0xb40 [zfs]
 zfs_inactive+0x132/0xb40 [zfs]
 ? zfs_dirty_inode+0xa20/0xa20 [zfs]
 ? _raw_spin_unlock_irq+0x2d/0x40
 zpl_evict_inode+0xa7/0x140 [zfs]
 evict+0x212/0x570
 dispose_list+0xfa/0x1d0
 ? list_lru_walk_one+0x9c/0xd0
 prune_icache_sb+0xd3/0x140
 ? invalidate_inodes+0x370/0x370
 ? list_lru_count_one+0x179/0x310
 super_cache_scan+0x292/0x440
 do_shrink_slab+0x2b9/0x800
 shrink_slab+0x195/0x410
 ? unregister_shrinker+0x290/0x290
 shrink_node+0x2e1/0x10f0
 ? shrink_node_memcg+0x1230/0x1230
 ? zone_watermark_ok_safe+0x35/0x270
 ? lock_acquire+0x153/0x330
 ? __fs_reclaim_acquire+0x5/0x30
 ? pgdat_balanced+0x91/0xd0
 kswapd+0x71c/0x11c0
 ? mem_cgroup_shrink_node+0x460/0x460
 ? sched_clock_cpu+0x133/0x170
 ? _raw_spin_unlock_irq+0x29/0x40
 ? wait_woken+0x260/0x260
 ? check_flags.part.23+0x480/0x480
 ? __kthread_parkme+0xad/0x180
 ? mem_cgroup_shrink_node+0x460/0x460
 kthread+0x2e7/0x3e0
 ? kthread_park+0x120/0x120
 ret_from_fork+0x3a/0x50

======================================================
WARNING: possible circular locking dependency detected
4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1 Tainted: G           O
------------------------------------------------------
mount.zfs/3249 is trying to acquire lock:
000000000347bea0 (&zp->z_lock){+.+.}, at: zpl_mmap+0x27e/0x550 [zfs]

but task is already holding lock:
00000000224314a3 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x118/0x190

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> openzfs#6 (&mm->mmap_sem){++++}:
       _copy_from_user+0x20/0xd0
       scsi_cmd_ioctl+0x47d/0x620
       cdrom_ioctl+0x10b/0x29b0
       sr_block_ioctl+0x107/0x150 [sr_mod]
       blkdev_ioctl+0x946/0x1600
       block_ioctl+0xdd/0x130
       do_vfs_ioctl+0x176/0xf70
       ksys_ioctl+0x66/0x70
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#5 (sr_mutex){+.+.}:
       sr_block_open+0x104/0x1a0 [sr_mod]
       __blkdev_get+0x249/0x11c0
       blkdev_get+0x280/0x7a0
       do_dentry_open+0x7ee/0x1020
       path_openat+0x11a7/0x2500
       do_filp_open+0x17f/0x260
       do_sys_open+0x195/0x300
       __se_sys_open+0xbf/0xf0
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#4 (&bdev->bd_mutex){+.+.}:
       __blkdev_get+0x383/0x11c0
       blkdev_get+0x3bc/0x7a0
       blkdev_get_by_path+0x73/0xc0
       vdev_disk_open+0x4c8/0x12e0 [zfs]
       vdev_open+0x34c/0x13e0 [zfs]
       vdev_open_child+0x46/0xd0 [zfs]
       taskq_thread+0x979/0x1480 [spl]
       kthread+0x2e7/0x3e0
       ret_from_fork+0x3a/0x50

-> openzfs#3 (&vd->vd_lock){++++}:
       vdev_disk_io_start+0x13e/0x2230 [zfs]
       zio_vdev_io_start+0x358/0x990 [zfs]
       zio_nowait+0x1f4/0x3a0 [zfs]
       vdev_mirror_io_start+0x211/0x7b0 [zfs]
       zio_vdev_io_start+0x7d3/0x990 [zfs]
       zio_nowait+0x1f4/0x3a0 [zfs]
       arc_read+0x1782/0x43a0 [zfs]
       dbuf_read_impl.constprop.13+0xcb4/0x1fe0 [zfs]
       dbuf_read+0x2c8/0x12a0 [zfs]
       dmu_buf_hold_by_dnode+0x6d/0xd0 [zfs]
       zap_get_leaf_byblk.isra.6.part.7+0xd3/0x9d0 [zfs]
       zap_deref_leaf+0x1f3/0x290 [zfs]
       fzap_lookup+0x13c/0x340 [zfs]
       zap_lookup_impl+0x84/0x3a0 [zfs]
       zap_lookup_norm+0xb9/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       spa_dir_prop+0x56/0xa0 [zfs]
       spa_ld_trusted_config+0xd0/0xe70 [zfs]
       spa_ld_mos_with_trusted_config+0x2b/0xb0 [zfs]
       spa_load+0x14d/0x27d0 [zfs]
       spa_tryimport+0x32e/0xa90 [zfs]
       zfs_ioc_pool_tryimport+0x107/0x190 [zfs]
       zfsdev_ioctl+0x1047/0x1370 [zfs]
       do_vfs_ioctl+0x176/0xf70
       ksys_ioctl+0x66/0x70
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#2 (&zap->zap_rwlock){++++}:
       zap_lockdir_impl+0x858/0x16c0 [zfs]
       zap_lockdir+0xd2/0x100 [zfs]
       zap_lookup_norm+0x7f/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       sa_setup+0x902/0x1380 [zfs]
       zfsvfs_init+0x6c8/0xc70 [zfs]
       zfsvfs_create_impl+0x5cf/0x970 [zfs]
       zfsvfs_create+0xc6/0x130 [zfs]
       zfs_domount+0x16f/0xea0 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#1 (&sa->sa_lock){+.+.}:
       sa_build_index+0x13d/0x790 [zfs]
       sa_handle_get_from_db+0x368/0x500 [zfs]
       zfs_znode_sa_init.isra.0+0x24b/0x330 [zfs]
       zfs_znode_alloc+0x3da/0x1a40 [zfs]
       zfs_zget+0x39a/0x6e0 [zfs]
       zfs_root+0x101/0x160 [zfs]
       zfs_domount+0x91f/0xea0 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&zp->z_lock){+.+.}:
       __mutex_lock+0xef/0x1380
       zpl_mmap+0x27e/0x550 [zfs]
       mmap_region+0x8fa/0x1150
       do_mmap+0x89a/0xd60
       vm_mmap_pgoff+0x14a/0x190
       ksys_mmap_pgoff+0x16b/0x490
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  &zp->z_lock --> sr_mutex --> &mm->mmap_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(sr_mutex);
                               lock(&mm->mmap_sem);
  lock(&zp->z_lock);

 *** DEADLOCK ***

1 lock held by mount.zfs/3249:
 #0: 00000000224314a3 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x118/0x190

stack backtrace:
CPU: 3 PID: 3249 Comm: mount.zfs Tainted: G           O      4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1
Hardware name: Dell Inc. PowerEdge R510/0W844P, BIOS 1.1.4 11/04/2009
Call Trace:
 dump_stack+0x91/0xeb
 print_circular_bug.isra.16+0x30b/0x5b0
 ? save_trace+0xd6/0x240
 __lock_acquire+0x41be/0x4f10
 ? debug_show_all_locks+0x2d0/0x2d0
 ? sched_clock_cpu+0x18/0x170
 ? sched_clock_cpu+0x18/0x170
 ? __lock_acquire+0xe3b/0x4f10
 ? reacquire_held_locks+0x191/0x430
 ? reacquire_held_locks+0x191/0x430
 ? lock_acquire+0x153/0x330
 lock_acquire+0x153/0x330
 ? zpl_mmap+0x27e/0x550 [zfs]
 ? zpl_mmap+0x27e/0x550 [zfs]
 __mutex_lock+0xef/0x1380
 ? zpl_mmap+0x27e/0x550 [zfs]
 ? __mutex_add_waiter+0x160/0x160
 ? zpl_mmap+0x27e/0x550 [zfs]
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0x18/0x170
 ? __mutex_add_waiter+0x160/0x160
 ? touch_atime+0xcd/0x230
 ? atime_needs_update+0x540/0x540
 ? do_raw_spin_unlock+0x54/0x250
 ? zpl_mmap+0x27e/0x550 [zfs]
 zpl_mmap+0x27e/0x550 [zfs]
 ? memset+0x1f/0x40
 mmap_region+0x8fa/0x1150
 ? arch_get_unmapped_area+0x460/0x460
 ? vm_brk+0x10/0x10
 ? lock_acquire+0x153/0x330
 ? lock_acquire+0x153/0x330
 ? security_mmap_addr+0x56/0x80
 ? get_unmapped_area+0x222/0x350
 do_mmap+0x89a/0xd60
 ? proc_keys_start+0x3d0/0x3d0
 vm_mmap_pgoff+0x14a/0x190
 ? vma_is_stack_for_current+0x90/0x90
 ? __ia32_sys_dup3+0xb0/0xb0
 ? vfs_statx_fd+0x49/0x80
 ? __se_sys_newfstat+0x75/0xa0
 ksys_mmap_pgoff+0x16b/0x490
 ? find_mergeable_anon_vma+0x90/0x90
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 ? do_syscall_64+0x18/0x410
 do_syscall_64+0x9b/0x410
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Not-signed-off-by: Jeff Dike <[email protected]>
jdike added a commit to jdike/zfs that referenced this issue Aug 1, 2019
This patch is an RFC.  There are at least two things it could be
getting wrong:
    1 - The use of a mutex to protect an increment; I couldn't see how
        to get atomic operations in this code
    2 - There could easily be a better fix than putting each newly
        allocated sa_os_t in its own lockdep class

This fix does eliminate the lockdep warnings, so there's that.

The two lockdep reports are below.  They show two different deadlock
scenarios, but they share a common link, which is
    thread 1 holding sa_lock and trying to get zap->zap_rwlock:
       zap_lockdir_impl+0x858/0x16c0 [zfs]
       zap_lockdir+0xd2/0x100 [zfs]
       zap_lookup_norm+0x7f/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       sa_setup+0x902/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]

    thread 2 trying to get sa_lock, either in sa_setup:
       sa_setup+0x742/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]
    or in sa_build_index:
       sa_build_index+0x13d/0x790 [zfs]
       sa_handle_get_from_db+0x368/0x500 [zfs]
       zfs_znode_sa_init.isra.0+0x24b/0x330 [zfs]
       zfs_znode_alloc+0x3da/0x1a40 [zfs]
       zfs_zget+0x39a/0x6e0 [zfs]
       zfs_root+0x101/0x160 [zfs]
       zfs_domount+0x91f/0xea0 [zfs]

AFAICT, sa_os_t is unique to its zfsvfs, so if we have two stacks
calling zfs_domount, each has a different zfsvfs and thus a different
sa, and there is no real deadlock here.

The sa_setup vs sa_setup case is easy, since each is referring to a
newly allocated sa_os_t.

In the sa_build_index vs sa_setup case, we need to reason that the
sa_os_t is unique to a zfsvfs.

======================================================
WARNING: possible circular locking dependency detected
4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1 Tainted: G        W  O
------------------------------------------------------
kswapd0/716 is trying to acquire lock:
00000000ac111d4a (&zfsvfs->z_teardown_inactive_lock){.+.+}, at: zfs_inactive+0x132/0xb40 [zfs]

but task is already holding lock:
00000000218b764d (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> openzfs#4 (fs_reclaim){+.+.}:
       kmem_cache_alloc_node_trace+0x43/0x380
       __kmalloc_node+0x3c/0x60
       spl_kmem_alloc+0xd9/0x1f0 [spl]
       zap_name_alloc+0x34/0x480 [zfs]
       zap_lookup_impl+0x27/0x3a0 [zfs]
       zap_lookup_norm+0xb9/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       dsl_dir_hold+0x341/0x660 [zfs]
       dsl_dataset_hold+0xb6/0x6c0 [zfs]
       dmu_objset_hold+0xca/0x120 [zfs]
       zpl_mount+0x90/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#3 (&zap->zap_rwlock){++++}:
       zap_lockdir_impl+0x7ed/0x15c0 [zfs]
       zap_lockdir+0xd2/0x100 [zfs]
       zap_lookup_norm+0x7f/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       sa_setup+0x902/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#2 (&sa->sa_lock){+.+.}:
       sa_setup+0x742/0x1380 [zfs]
       zfsvfs_init+0x3d6/0xb20 [zfs]
       zfsvfs_create+0x5dd/0x900 [zfs]
       zfs_domount+0xa3/0xe20 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#1 (&os->os_user_ptr_lock){+.+.}:
       zfs_get_vfs_flag_unmounted+0x63/0x3c0 [zfs]
       dmu_free_long_range+0x963/0xda0 [zfs]
       zfs_rmnode+0x719/0x9c0 [zfs]
       zfs_inactive+0x306/0xb40 [zfs]
       zpl_evict_inode+0xa7/0x140 [zfs]
       evict+0x212/0x570
       do_unlinkat+0x2e6/0x540
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&zfsvfs->z_teardown_inactive_lock){.+.+}:
       down_read+0x3f/0xe0
       zfs_inactive+0x132/0xb40 [zfs]
       zpl_evict_inode+0xa7/0x140 [zfs]
       evict+0x212/0x570
       dispose_list+0xfa/0x1d0
       prune_icache_sb+0xd3/0x140
       super_cache_scan+0x292/0x440
       do_shrink_slab+0x2b9/0x800
       shrink_slab+0x195/0x410
       shrink_node+0x2e1/0x10f0
       kswapd+0x71c/0x11c0
       kthread+0x2e7/0x3e0
       ret_from_fork+0x3a/0x50

other info that might help us debug this:

Chain exists of:
  &zfsvfs->z_teardown_inactive_lock --> &zap->zap_rwlock --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(&zap->zap_rwlock);
                               lock(fs_reclaim);
  lock(&zfsvfs->z_teardown_inactive_lock);

 *** DEADLOCK ***

3 locks held by kswapd0/716:
 #0: 00000000218b764d (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 openzfs#1: 00000000f9a6bfa1 (shrinker_rwsem){++++}, at: shrink_slab+0x109/0x410
 openzfs#2: 0000000076154958 (&type->s_umount_key#50){.+.+}, at: trylock_super+0x16/0xc0

stack backtrace:
CPU: 5 PID: 716 Comm: kswapd0 Tainted: G        W  O      4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1
Hardware name: Dell Inc. PowerEdge R510/0W844P, BIOS 1.1.4 11/04/2009
Call Trace:
 dump_stack+0x91/0xeb
 print_circular_bug.isra.16+0x30b/0x5b0
 ? save_trace+0xd6/0x240
 __lock_acquire+0x41be/0x4f10
 ? debug_show_all_locks+0x2d0/0x2d0
 ? sched_clock_cpu+0x133/0x170
 ? lock_acquire+0x153/0x330
 lock_acquire+0x153/0x330
 ? zfs_inactive+0x132/0xb40 [zfs]
 down_read+0x3f/0xe0
 ? zfs_inactive+0x132/0xb40 [zfs]
 zfs_inactive+0x132/0xb40 [zfs]
 ? zfs_dirty_inode+0xa20/0xa20 [zfs]
 ? _raw_spin_unlock_irq+0x2d/0x40
 zpl_evict_inode+0xa7/0x140 [zfs]
 evict+0x212/0x570
 dispose_list+0xfa/0x1d0
 ? list_lru_walk_one+0x9c/0xd0
 prune_icache_sb+0xd3/0x140
 ? invalidate_inodes+0x370/0x370
 ? list_lru_count_one+0x179/0x310
 super_cache_scan+0x292/0x440
 do_shrink_slab+0x2b9/0x800
 shrink_slab+0x195/0x410
 ? unregister_shrinker+0x290/0x290
 shrink_node+0x2e1/0x10f0
 ? shrink_node_memcg+0x1230/0x1230
 ? zone_watermark_ok_safe+0x35/0x270
 ? lock_acquire+0x153/0x330
 ? __fs_reclaim_acquire+0x5/0x30
 ? pgdat_balanced+0x91/0xd0
 kswapd+0x71c/0x11c0
 ? mem_cgroup_shrink_node+0x460/0x460
 ? sched_clock_cpu+0x133/0x170
 ? _raw_spin_unlock_irq+0x29/0x40
 ? wait_woken+0x260/0x260
 ? check_flags.part.23+0x480/0x480
 ? __kthread_parkme+0xad/0x180
 ? mem_cgroup_shrink_node+0x460/0x460
 kthread+0x2e7/0x3e0
 ? kthread_park+0x120/0x120
 ret_from_fork+0x3a/0x50

======================================================
WARNING: possible circular locking dependency detected
4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1 Tainted: G           O
------------------------------------------------------
mount.zfs/3249 is trying to acquire lock:
000000000347bea0 (&zp->z_lock){+.+.}, at: zpl_mmap+0x27e/0x550 [zfs]

but task is already holding lock:
00000000224314a3 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x118/0x190

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> openzfs#6 (&mm->mmap_sem){++++}:
       _copy_from_user+0x20/0xd0
       scsi_cmd_ioctl+0x47d/0x620
       cdrom_ioctl+0x10b/0x29b0
       sr_block_ioctl+0x107/0x150 [sr_mod]
       blkdev_ioctl+0x946/0x1600
       block_ioctl+0xdd/0x130
       do_vfs_ioctl+0x176/0xf70
       ksys_ioctl+0x66/0x70
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#5 (sr_mutex){+.+.}:
       sr_block_open+0x104/0x1a0 [sr_mod]
       __blkdev_get+0x249/0x11c0
       blkdev_get+0x280/0x7a0
       do_dentry_open+0x7ee/0x1020
       path_openat+0x11a7/0x2500
       do_filp_open+0x17f/0x260
       do_sys_open+0x195/0x300
       __se_sys_open+0xbf/0xf0
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#4 (&bdev->bd_mutex){+.+.}:
       __blkdev_get+0x383/0x11c0
       blkdev_get+0x3bc/0x7a0
       blkdev_get_by_path+0x73/0xc0
       vdev_disk_open+0x4c8/0x12e0 [zfs]
       vdev_open+0x34c/0x13e0 [zfs]
       vdev_open_child+0x46/0xd0 [zfs]
       taskq_thread+0x979/0x1480 [spl]
       kthread+0x2e7/0x3e0
       ret_from_fork+0x3a/0x50

-> openzfs#3 (&vd->vd_lock){++++}:
       vdev_disk_io_start+0x13e/0x2230 [zfs]
       zio_vdev_io_start+0x358/0x990 [zfs]
       zio_nowait+0x1f4/0x3a0 [zfs]
       vdev_mirror_io_start+0x211/0x7b0 [zfs]
       zio_vdev_io_start+0x7d3/0x990 [zfs]
       zio_nowait+0x1f4/0x3a0 [zfs]
       arc_read+0x1782/0x43a0 [zfs]
       dbuf_read_impl.constprop.13+0xcb4/0x1fe0 [zfs]
       dbuf_read+0x2c8/0x12a0 [zfs]
       dmu_buf_hold_by_dnode+0x6d/0xd0 [zfs]
       zap_get_leaf_byblk.isra.6.part.7+0xd3/0x9d0 [zfs]
       zap_deref_leaf+0x1f3/0x290 [zfs]
       fzap_lookup+0x13c/0x340 [zfs]
       zap_lookup_impl+0x84/0x3a0 [zfs]
       zap_lookup_norm+0xb9/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       spa_dir_prop+0x56/0xa0 [zfs]
       spa_ld_trusted_config+0xd0/0xe70 [zfs]
       spa_ld_mos_with_trusted_config+0x2b/0xb0 [zfs]
       spa_load+0x14d/0x27d0 [zfs]
       spa_tryimport+0x32e/0xa90 [zfs]
       zfs_ioc_pool_tryimport+0x107/0x190 [zfs]
       zfsdev_ioctl+0x1047/0x1370 [zfs]
       do_vfs_ioctl+0x176/0xf70
       ksys_ioctl+0x66/0x70
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#2 (&zap->zap_rwlock){++++}:
       zap_lockdir_impl+0x858/0x16c0 [zfs]
       zap_lockdir+0xd2/0x100 [zfs]
       zap_lookup_norm+0x7f/0x100 [zfs]
       zap_lookup+0x12/0x20 [zfs]
       sa_setup+0x902/0x1380 [zfs]
       zfsvfs_init+0x6c8/0xc70 [zfs]
       zfsvfs_create_impl+0x5cf/0x970 [zfs]
       zfsvfs_create+0xc6/0x130 [zfs]
       zfs_domount+0x16f/0xea0 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> openzfs#1 (&sa->sa_lock){+.+.}:
       sa_build_index+0x13d/0x790 [zfs]
       sa_handle_get_from_db+0x368/0x500 [zfs]
       zfs_znode_sa_init.isra.0+0x24b/0x330 [zfs]
       zfs_znode_alloc+0x3da/0x1a40 [zfs]
       zfs_zget+0x39a/0x6e0 [zfs]
       zfs_root+0x101/0x160 [zfs]
       zfs_domount+0x91f/0xea0 [zfs]
       zpl_mount+0x270/0x3b0 [zfs]
       mount_fs+0x86/0x2b0
       vfs_kern_mount+0x68/0x3c0
       do_mount+0x306/0x2550
       ksys_mount+0x7e/0xd0
       __x64_sys_mount+0xba/0x150
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&zp->z_lock){+.+.}:
       __mutex_lock+0xef/0x1380
       zpl_mmap+0x27e/0x550 [zfs]
       mmap_region+0x8fa/0x1150
       do_mmap+0x89a/0xd60
       vm_mmap_pgoff+0x14a/0x190
       ksys_mmap_pgoff+0x16b/0x490
       do_syscall_64+0x9b/0x410
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  &zp->z_lock --> sr_mutex --> &mm->mmap_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(sr_mutex);
                               lock(&mm->mmap_sem);
  lock(&zp->z_lock);

 *** DEADLOCK ***

1 lock held by mount.zfs/3249:
 #0: 00000000224314a3 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x118/0x190

stack backtrace:
CPU: 3 PID: 3249 Comm: mount.zfs Tainted: G           O      4.19.55-4.19.2-debug-b494b4b34cd8ef26 openzfs#1
Hardware name: Dell Inc. PowerEdge R510/0W844P, BIOS 1.1.4 11/04/2009
Call Trace:
 dump_stack+0x91/0xeb
 print_circular_bug.isra.16+0x30b/0x5b0
 ? save_trace+0xd6/0x240
 __lock_acquire+0x41be/0x4f10
 ? debug_show_all_locks+0x2d0/0x2d0
 ? sched_clock_cpu+0x18/0x170
 ? sched_clock_cpu+0x18/0x170
 ? __lock_acquire+0xe3b/0x4f10
 ? reacquire_held_locks+0x191/0x430
 ? reacquire_held_locks+0x191/0x430
 ? lock_acquire+0x153/0x330
 lock_acquire+0x153/0x330
 ? zpl_mmap+0x27e/0x550 [zfs]
 ? zpl_mmap+0x27e/0x550 [zfs]
 __mutex_lock+0xef/0x1380
 ? zpl_mmap+0x27e/0x550 [zfs]
 ? __mutex_add_waiter+0x160/0x160
 ? zpl_mmap+0x27e/0x550 [zfs]
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0x18/0x170
 ? __mutex_add_waiter+0x160/0x160
 ? touch_atime+0xcd/0x230
 ? atime_needs_update+0x540/0x540
 ? do_raw_spin_unlock+0x54/0x250
 ? zpl_mmap+0x27e/0x550 [zfs]
 zpl_mmap+0x27e/0x550 [zfs]
 ? memset+0x1f/0x40
 mmap_region+0x8fa/0x1150
 ? arch_get_unmapped_area+0x460/0x460
 ? vm_brk+0x10/0x10
 ? lock_acquire+0x153/0x330
 ? lock_acquire+0x153/0x330
 ? security_mmap_addr+0x56/0x80
 ? get_unmapped_area+0x222/0x350
 do_mmap+0x89a/0xd60
 ? proc_keys_start+0x3d0/0x3d0
 vm_mmap_pgoff+0x14a/0x190
 ? vma_is_stack_for_current+0x90/0x90
 ? __ia32_sys_dup3+0xb0/0xb0
 ? vfs_statx_fd+0x49/0x80
 ? __se_sys_newfstat+0x75/0xa0
 ksys_mmap_pgoff+0x16b/0x490
 ? find_mergeable_anon_vma+0x90/0x90
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 ? do_syscall_64+0x18/0x410
 do_syscall_64+0x9b/0x410
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Not-signed-off-by: Jeff Dike <[email protected]>
ahrens referenced this issue in ahrens/zfs Dec 9, 2019
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of `fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)`
(in vd_path) across the call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address 0x608000a1fcd0 at pc 0x7fe88b0c166e bp 0x7fe878414ad0 sp 0x7fe878414278
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup ../../module/zfs/spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove ../../module/zfs/vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove /export/home/delphix/zfs/cmd/ztest/ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute /export/home/delphix/zfs/cmd/ztest/ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread /export/home/delphix/zfs/cmd/ztest/ztest.c:6761
    #6 0x7fe889cbc6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #7 0x7fe8899e588e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)

0x608000a1fcd0 is located 48 bytes inside of 88-byte region [0x608000a1fca0,0x608000a1fcf8)
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7b8)
    #1 0x7fe88ae541c5 in nvlist_free ../../module/nvpair/nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free ../../module/nvpair/nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair ../../module/nvpair/nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux ../../module/zfs/vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove ../../module/zfs/vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove /export/home/delphix/zfs/cmd/ztest/ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute /export/home/delphix/zfs/cmd/ztest/ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread /export/home/delphix/zfs/cmd/ztest/ztest.c:6761
    #9 0x7fe889cbc6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
ahrens referenced this issue in ahrens/zfs Dec 9, 2019
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of `fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)`
(in vd_path) across the call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address 0x608000a1fcd0 at pc 0x7fe88b0c166e bp 0x7fe878414ad0 sp 0x7fe878414278
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup ../../module/zfs/spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove ../../module/zfs/vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove /export/home/delphix/zfs/cmd/ztest/ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute /export/home/delphix/zfs/cmd/ztest/ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread /export/home/delphix/zfs/cmd/ztest/ztest.c:6761
    #6 0x7fe889cbc6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #7 0x7fe8899e588e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)

0x608000a1fcd0 is located 48 bytes inside of 88-byte region [0x608000a1fca0,0x608000a1fcf8)
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7b8)
    #1 0x7fe88ae541c5 in nvlist_free ../../module/nvpair/nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free ../../module/nvpair/nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair ../../module/nvpair/nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux ../../module/zfs/vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove ../../module/zfs/vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove /export/home/delphix/zfs/cmd/ztest/ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute /export/home/delphix/zfs/cmd/ztest/ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread /export/home/delphix/zfs/cmd/ztest/ztest.c:6761
    #9 0x7fe889cbc6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)

Signed-off-by: Matthew Ahrens <[email protected]>
ahrens referenced this issue in ahrens/zfs Dec 10, 2019
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #6 0x7fe889cbc6da in start_thread
    #7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #9 0x7fe889cbc6da in start_thread

Signed-off-by: Matthew Ahrens <[email protected]>
behlendorf pushed a commit that referenced this issue Dec 11, 2019
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #6 0x7fe889cbc6da in start_thread
    #7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #9 0x7fe889cbc6da in start_thread

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #9706
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 26, 2019
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    openzfs#5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    openzfs#6 0x7fe889cbc6da in start_thread
    openzfs#7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    openzfs#5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    openzfs#6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    openzfs#7 0x55ffbc769fba in ztest_execute ztest.c:6714
    openzfs#8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    openzfs#9 0x7fe889cbc6da in start_thread

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#9706
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 27, 2019
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    openzfs#5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    openzfs#6 0x7fe889cbc6da in start_thread
    openzfs#7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    openzfs#5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    openzfs#6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    openzfs#7 0x55ffbc769fba in ztest_execute ztest.c:6714
    openzfs#8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    openzfs#9 0x7fe889cbc6da in start_thread

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes openzfs#9706
tonyhutter pushed a commit that referenced this issue Jan 23, 2020
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #6 0x7fe889cbc6da in start_thread
    #7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #9 0x7fe889cbc6da in start_thread

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
Closes #9706
markroper added a commit to markroper/zfs that referenced this issue Feb 12, 2020
Using zfs with Lustre, an arc_read can trigger kernel memory allocation
that in turn leads to a memory reclaim callback and a deadlock within a
single zfs process. This change uses spl_fstrans_mark and
spl_trans_unmark to prevent the reclaim attempt and the deadlock
(https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba).
The stack trace observed is:

     #0 [ffffc9002b98adc8] __schedule at ffffffff81610f2e
     openzfs#1 [ffffc9002b98ae68] schedule at ffffffff81611558
     openzfs#2 [ffffc9002b98ae70] schedule_preempt_disabled at ffffffff8161184a
     openzfs#3 [ffffc9002b98ae78] __mutex_lock at ffffffff816131e8
     openzfs#4 [ffffc9002b98af18] arc_buf_destroy at ffffffffa0bf37d7 [zfs]
     openzfs#5 [ffffc9002b98af48] dbuf_destroy at ffffffffa0bfa6fe [zfs]
     openzfs#6 [ffffc9002b98af88] dbuf_evict_one at ffffffffa0bfaa96 [zfs]
     openzfs#7 [ffffc9002b98afa0] dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs]
     openzfs#8 [ffffc9002b98b050] dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs]
     openzfs#9 [ffffc9002b98b100] osd_object_delete at ffffffffa0b64ecc [osd_zfs]
    openzfs#10 [ffffc9002b98b118] lu_object_free at ffffffffa06d6a74 [obdclass]
    openzfs#11 [ffffc9002b98b178] lu_site_purge_objects at ffffffffa06d7fc1 [obdclass]
    openzfs#12 [ffffc9002b98b220] lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass]
    openzfs#13 [ffffc9002b98b278] shrink_slab at ffffffff811ca9d8
    openzfs#14 [ffffc9002b98b338] shrink_node at ffffffff811cfd94
    openzfs#15 [ffffc9002b98b3b8] do_try_to_free_pages at ffffffff811cfe63
    openzfs#16 [ffffc9002b98b408] try_to_free_pages at ffffffff811d01c4
    openzfs#17 [ffffc9002b98b488] __alloc_pages_slowpath at ffffffff811be7f2
    openzfs#18 [ffffc9002b98b580] __alloc_pages_nodemask at ffffffff811bf3ed
    openzfs#19 [ffffc9002b98b5e0] new_slab at ffffffff81226304
    openzfs#20 [ffffc9002b98b638] ___slab_alloc at ffffffff812272ab
    openzfs#21 [ffffc9002b98b6f8] __slab_alloc at ffffffff8122740c
    openzfs#22 [ffffc9002b98b708] kmem_cache_alloc at ffffffff81227578
    openzfs#23 [ffffc9002b98b740] spl_kmem_cache_alloc at ffffffffa048a1fd [spl]
    openzfs#24 [ffffc9002b98b780] arc_buf_alloc_impl at ffffffffa0befba2 [zfs]
    openzfs#25 [ffffc9002b98b7b0] arc_read at ffffffffa0bf0924 [zfs]
    openzfs#26 [ffffc9002b98b858] dbuf_read at ffffffffa0bf9083 [zfs]
    openzfs#27 [ffffc9002b98b900] dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs]

Signed-off-by: Mark Roper <[email protected]>
allanjude pushed a commit to KlaraSystems/zfs that referenced this issue Apr 28, 2020
SPPSUP-1149: Fix sharenfs issues with locking
problame added a commit to problame/zfs that referenced this issue Oct 10, 2020
This is a fixup of commit 0fdd610

See added test case for a reproducer.

Stack trace:

    panic: VERIFY3(nvlist_next_nvpair(redactnvl, pair) == NULL) failed (0xfffff80003ce5d18x == 0x)

    cpuid = 7
    time = 1602212370
    KDB: stack backtrace:
    #0 0xffffffff80c1d297 at kdb_backtrace+0x67
    openzfs#1 0xffffffff80bd05cd at vpanic+0x19d
    openzfs#2 0xffffffff828446fa at spl_panic+0x3a
    openzfs#3 0xffffffff828af85d at dmu_redact_snap+0x39d
    openzfs#4 0xffffffff829c0370 at zfs_ioc_redact+0xa0
    openzfs#5 0xffffffff829bba44 at zfsdev_ioctl_common+0x4a4
    openzfs#6 0xffffffff8284c3ed at zfsdev_ioctl+0x14d
    openzfs#7 0xffffffff80a85ead at devfs_ioctl+0xad
    openzfs#8 0xffffffff8122a46c at VOP_IOCTL_APV+0x7c
    openzfs#9 0xffffffff80cb0a3a at vn_ioctl+0x16a
    openzfs#10 0xffffffff80a8649f at devfs_ioctl_f+0x1f
    openzfs#11 0xffffffff80c3b55e at kern_ioctl+0x2be
    openzfs#12 0xffffffff80c3b22d at sys_ioctl+0x15d
    openzfs#13 0xffffffff810a88e4 at amd64_syscall+0x364
    openzfs#14 0xffffffff81082330 at fast_syscall_common+0x101

Signed-off-by: Christian Schwarz <[email protected]>
rob-wing pushed a commit to KlaraSystems/zfs that referenced this issue Feb 17, 2023
Under certain loads, the following panic is hit:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    openzfs#4 0xffffffff8066fdee at vinactivef+0xde
    openzfs#5 0xffffffff80670b8a at vgonel+0x1ea
    openzfs#6 0xffffffff806711e1 at vgone+0x31
    openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149
    openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    openzfs#10 0xffffffff80661c2c at lookup+0x45c
    openzfs#11 0xffffffff80660e59 at namei+0x259
    openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3
    openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f
    openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c
    openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8

A race condition can occur when allocating a new vnode and adding that
vnode to the vfs hash. If the newly created vnode loses the race when
being inserted into the vfs hash, it will not be recycled as its
usecount is greater than zero, hitting the above assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700

Signed-off-by:  Rob Wing <[email protected]>
Sponsored-by:   rsync.net
Sponsored-by:   Klara, Inc.
rob-wing pushed a commit to KlaraSystems/zfs that referenced this issue Feb 17, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    openzfs#4 0xffffffff808adc6f at trap_pfault+0x4f
    openzfs#5 0xffffffff80886da8 at calltrap+0x8
    openzfs#6 0xffffffff80669186 at vgonel+0x186
    openzfs#7 0xffffffff80669841 at vgone+0x31
    openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d
    openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149
    openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    openzfs#11 0xffffffff8065a28c at lookup+0x45c
    openzfs#12 0xffffffff806594b9 at namei+0x259
    openzfs#13 0xffffffff80676a33 at kern_statat+0xf3
    openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f
    openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c
    openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    openzfs#4 0xffffffff8066fdee at vinactivef+0xde
    openzfs#5 0xffffffff80670b8a at vgonel+0x1ea
    openzfs#6 0xffffffff806711e1 at vgone+0x31
    openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149
    openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    openzfs#10 0xffffffff80661c2c at lookup+0x45c
    openzfs#11 0xffffffff80660e59 at namei+0x259
    openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3
    openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f
    openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c
    openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700

Signed-off-by:  Rob Wing <[email protected]>
Submitted-by:   Klara, Inc.
Sponsored-by:   rsync.net
behlendorf pushed a commit that referenced this issue Feb 22, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Rob Wing <[email protected]>
Co-authored-by: Rob Wing <[email protected]>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes #14501
behlendorf pushed a commit that referenced this issue May 30, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Rob Wing <[email protected]>
Co-authored-by: Rob Wing <[email protected]>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes #14501
EchterAgo pushed a commit to EchterAgo/zfs that referenced this issue Aug 4, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    openzfs#1 0xffffffff8058e86f at vpanic+0x17f
    openzfs#2 0xffffffff8058e6e3 at panic+0x43
    openzfs#3 0xffffffff808adc15 at trap_fatal+0x385
    openzfs#4 0xffffffff808adc6f at trap_pfault+0x4f
    openzfs#5 0xffffffff80886da8 at calltrap+0x8
    openzfs#6 0xffffffff80669186 at vgonel+0x186
    openzfs#7 0xffffffff80669841 at vgone+0x31
    openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d
    openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149
    openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    openzfs#11 0xffffffff8065a28c at lookup+0x45c
    openzfs#12 0xffffffff806594b9 at namei+0x259
    openzfs#13 0xffffffff80676a33 at kern_statat+0xf3
    openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f
    openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c
    openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    openzfs#1 0xffffffff8059620f at vpanic+0x17f
    openzfs#2 0xffffffff81a27f4a at spl_panic+0x3a
    openzfs#3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    openzfs#4 0xffffffff8066fdee at vinactivef+0xde
    openzfs#5 0xffffffff80670b8a at vgonel+0x1ea
    openzfs#6 0xffffffff806711e1 at vgone+0x31
    openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149
    openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    openzfs#10 0xffffffff80661c2c at lookup+0x45c
    openzfs#11 0xffffffff80660e59 at namei+0x259
    openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3
    openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f
    openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c
    openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <[email protected]>
Reviewed-by: Mateusz Guzik <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Rob Wing <[email protected]>
Co-authored-by: Rob Wing <[email protected]>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

2 participants