Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messy locking in zfs_inode_update #4578

Closed
lorddoskias opened this issue Apr 29, 2016 · 3 comments
Closed

Messy locking in zfs_inode_update #4578

lorddoskias opened this issue Apr 29, 2016 · 3 comments

Comments

@lorddoskias
Copy link
Contributor

lorddoskias commented Apr 29, 2016

I'm creating this issue to be able to continue discussion of the problems observed in zfs_update_inode in 4538. So here is what I've discovered:

  1. In zfs_mkdir zfs_inode_update(dzp) is called to update the state of the parent in which we are creating a directory but also zfs_inode_update(zp); is called to update the state of the newly created inode. However, I think this second call is redundant since zfs_inode_update_new(zp) is already called in zfs_mknode.
  2. The same pattern can be observed inzfs_create after the out: label. Is there really a point in calling zfs_inode_update for the newly created zp?

Furthermore, there are cases where zfs_inode_update is rightfully called, yet the i_mutex is not held, so we must lock it. Example:

[   22.987395] VERIFY(mutex_is_locked(&zp->z_inode.i_mutex)) failed
[   22.987930] PANIC at zfs_znode.c:582:zfs_inode_update()
[   22.988500] Showing stack for process 1122
[   22.988858] CPU: 0 PID: 1122 Comm: stat Tainted: G           OE   4.6.0-rc5-nbor #4
[   22.989509] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   22.990301]  0000000000000000 ffff880038a8b9d8 ffffffff813c31e3 ffffffffa022d472
[   22.990958]  0000000000000246 ffff880038a8b9e8 ffffffffa0006ab2 ffff880038a8bb70
[   22.991634]  ffffffffa0006b7b 0000000000000001 0000000000000028 ffff880038a8bb80
[   22.992289] Call Trace:
[   22.992512]  [<ffffffff813c31e3>] dump_stack+0x85/0xc2
[   22.992941]  [<ffffffffa0006ab2>] spl_dumpstack+0x42/0x50 [spl]
[   22.993425]  [<ffffffffa0006b7b>] spl_panic+0xbb/0xf0 [spl]
[   22.993866]  [<ffffffff810a2cb6>] ? trace_hardirqs_on_caller+0x16/0x1c0
[   22.994382]  [<ffffffff810a2cb6>] ? trace_hardirqs_on_caller+0x16/0x1c0
[   22.994902]  [<ffffffff810a2e6d>] ? trace_hardirqs_on+0xd/0x10
[   22.995363]  [<ffffffffa000117a>] ? spl_kmem_free+0x2a/0x40 [spl]
[   22.995890]  [<ffffffffa0185129>] ? zfs_dirent_unlock+0x219/0x340 [zfs]
[   22.996454]  [<ffffffffa0185c86>] ? zfs_dirlook+0x136/0x3d0 [zfs]
[   22.996959]  [<ffffffffa01ab2cb>] zfs_inode_update+0x4b/0x60 [zfs]
[   22.997469]  [<ffffffffa019dcd0>] zfs_lookup+0x2c0/0x2e0 [zfs]
[   22.997957]  [<ffffffffa01c8b5e>] zpl_lookup+0xae/0x2b0 [zfs]
[   22.998410]  [<ffffffff811c0cdd>] lookup_real+0x1d/0x60
[   22.998820]  [<ffffffff811c171f>] lookup_slow+0x5f/0xf0
[   22.999230]  [<ffffffff811c49ed>] walk_component+0x1bd/0x280
[   22.999677]  [<ffffffff811c2bb5>] ? path_init+0x555/0x750
[   23.000100]  [<ffffffff811c509d>] path_lookupat+0x5d/0x110
[   23.000530]  [<ffffffff811c6b8a>] filename_lookup+0x9a/0x110
[   23.000975]  [<ffffffff8107b9e9>] ? __might_sleep+0x49/0x80
[   23.001460]  [<ffffffff8119a9b1>] ? kmem_cache_alloc+0x1f1/0x2d0
[   23.001942]  [<ffffffff8117a1a8>] ? handle_mm_fault+0xd18/0x1ac0
[   23.002412]  [<ffffffff811c6863>] ? getname_flags+0x53/0x1a0
[   23.002856]  [<ffffffff811c6cb6>] user_path_at_empty+0x36/0x40
[   23.003310]  [<ffffffff811bbaf3>] vfs_fstatat+0x53/0xa0
[   23.003732]  [<ffffffff8109eb1f>] ? up_read+0x1f/0x40
[   23.004127]  [<ffffffff811bc017>] SyS_newlstat+0x27/0x40
[   23.004543]  [<ffffffff810a2cb6>] ? trace_hardirqs_on_caller+0x16/0x1c0
[   23.005059]  [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
[   23.005562]  [<ffffffff815fed40>] entry_SYSCALL_64_fastpath+0x23/0xc1
[   23.006063]  [<ffffffff8109ff7f>] ? trace_hardirqs_off_caller+0x1f/0xc0

Or

[   54.521005]  [<ffffffff813c31e3>] dump_stack+0x85/0xc2
[   54.521409]  [<ffffffffa0006ab2>] spl_dumpstack+0x42/0x50 [spl]
[   54.521867]  [<ffffffffa0006b7b>] spl_panic+0xbb/0xf0 [spl]
[   54.522300]  [<ffffffff810a2cb6>] ? trace_hardirqs_on_caller+0x16/0x1c0
[   54.522824]  [<ffffffff815fc93b>] ? __mutex_unlock_slowpath+0xeb/0x1a0
[   54.523319]  [<ffffffff810a2cb6>] ? trace_hardirqs_on_caller+0x16/0x1c0
[   54.523845]  [<ffffffff810a2e6d>] ? trace_hardirqs_on+0xd/0x10
[   54.524283]  [<ffffffffa000117a>] ? spl_kmem_free+0x2a/0x40 [spl]
[   54.524778]  [<ffffffffa01ab2fb>] zfs_inode_update+0x4b/0x60 [zfs]
[   54.525269]  [<ffffffffa01a4862>] zfs_write+0x2a2/0xf50 [zfs]
[   54.525715]  [<ffffffff8109ff7f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[   54.526256]  [<ffffffffa01c6cef>] zpl_write_common_iovec+0x7f/0x100 [zfs]
[   54.526793]  [<ffffffffa01c6f4d>] zpl_write+0x7d/0xa0 [zfs]
[   54.527223]  [<ffffffff811b5d28>] __vfs_write+0x28/0xf0
[   54.527617]  [<ffffffff8109ecfe>] ? update_fast_ctr+0x1e/0x40
[   54.528050]  [<ffffffff8109eda7>] ? percpu_down_read+0x57/0xa0
[   54.528489]  [<ffffffff811ba437>] ? __sb_start_write+0xb7/0xf0
[   54.528989]  [<ffffffff811ba437>] ? __sb_start_write+0xb7/0xf0
[   54.529458]  [<ffffffff811b60cb>] vfs_write+0xbb/0x1f0
[   54.529874]  [<ffffffff811b7e39>] SyS_write+0x49/0xb0
[   54.530269]  [<ffffffff815fed40>] entry_SYSCALL_64_fastpath+0x23/0xc1
[   54.530766]  [<ffffffff8109ff7f>] ? trace_hardirqs_off_caller+0x1f/0xc0

I guess we really need to do a mutex_trylock/lock in zfs_inode_update. Or identify all such sites and lock the mutex before calling into zfs_inode_update

@tuxoko
Copy link
Contributor

tuxoko commented Apr 29, 2016

The correct way to do is to to remove this function entirely. When Linux VFS wants to update the fields in inode, it will grab the required lock. We really shouldn't be messing around the field in inode and i_mutex.

@behlendorf
Copy link
Contributor

Removing it would be best and it what I always intended, see issue #227. It has always been an ugly solution.

@lorddoskias
Copy link
Contributor Author

I agree with you, but until this time comes (i intend to work on this) i think the locking needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@tuxoko @behlendorf @lorddoskias and others