Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use after free by zio_done #6401

Closed
verygreen opened this issue Jul 25, 2017 · 7 comments
Closed

Use after free by zio_done #6401

verygreen opened this issue Jul 25, 2017 · 7 comments

Comments

@verygreen
Copy link
Contributor

I started to do Lustre on zfs testing on a kernel with debug_pagealloc enabled (the option that unmaps memory once it is freed to catch use after free cases) and I have been having a lot of this sort of crashes.
This is an otherwise standard rhel 7.2 kernel version 3.10.0-327.22.2.el7 with just a lot of debug enabled and mutex_lock_nested converted to EXPORT_SYMBOL (without GPL).
In particular LOCK_DEP is enabled

I will try plain zfs to see if I can arrive at a plain posix reproducer, but just wanted to throw it here in case somebody sees an obvious problem right away.

[ 6647.603631] Lustre: DEBUG MARKER: == sanity test 123a: verify statahead work =========================================================== 15:40:02 (1501011602)
[ 6650.248434] Lustre: DEBUG MARKER: ls 100 files without statahead: 1 sec
[ 6652.471285] Lustre: DEBUG MARKER: ls 100 files with statahead: 1 sec
[ 6678.762741] Lustre: DEBUG MARKER: ls 1000 files without statahead: 11 sec
[ 6690.524214] Lustre: DEBUG MARKER: ls 1000 files with statahead: 10 sec
[ 6790.373734] Lustre: DEBUG MARKER: ls 4000 files without statahead: 47 sec
[ 6835.412014] Lustre: DEBUG MARKER: ls 4000 files with statahead: 43 sec
[ 6835.584783] Lustre: DEBUG MARKER: ls done
[ 8199.809947] BUG: unable to handle kernel paging request at ffff8802886bee00
[ 8199.811714] IP: [<ffffffffa0b9f4e6>] zio_done+0x486/0x1210 [zfs]
[ 8199.812701] PGD 2e75067 PUD 33ebfa067 PMD 33ebb6067 PTE 80000002886be060
[ 8199.813934] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 8199.814807] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm ata_piix i2c_piix4 serio_raw pcspkr virtio_console i2c_core libata virtio_balloon virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[ 8199.824444] CPU: 4 PID: 23725 Comm: z_wr_int_2 Tainted: P           OE  ------------   3.10.0-debug #2
[ 8199.825633] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 8199.826182] task: ffff8802db3ea680 ti: ffff8802a5a24000 task.ti: ffff8802a5a24000
[ 8199.836424] RIP: 0010:[<ffffffffa0b9f4e6>]  [<ffffffffa0b9f4e6>] zio_done+0x486/0x1210 [zfs]
[ 8199.837546] RSP: 0018:ffff8802a5a27d10  EFLAGS: 00010246
[ 8199.838046] RAX: ffff8802a5a27fd8 RBX: ffff8802ea7169b8 RCX: 0000000000000001
[ 8199.838537] RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff8802886bedc0
[ 8199.839074] RBP: ffff8802a5a27d80 R08: 0000000000000001 R09: 0000000000000001
[ 8199.839600] R10: 0000000000000000 R11: ffff8802e9a12ed0 R12: ffff8802886bed38
[ 8199.840196] R13: 0000000000000000 R14: ffff8802886be9b8 R15: ffff8802886bedc0
[ 8199.840730] FS:  0000000000000000(0000) GS:ffff88033e480000(0000) knlGS:0000000000000000
[ 8199.841639] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 8199.842157] CR2: ffff8802886bee00 CR3: 000000009dce0000 CR4: 00000000000006e0
[ 8199.842694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8199.843266] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8199.843775] Stack:
[ 8199.844221]  ffff8802e9920dc0 ffff8802ea716dc0 00000000000a0000 ffff8802a5a27d48
[ 8199.845178]  ffff8802ea716d38 ffff8802db3ea680 ffff8802a5a27d48 0000000000000000
[ 8199.846177]  00000000a0d391b8 0000000000800000 ffff8802ea7169b8 00000000ffffffff
[ 8199.847119] Call Trace:
[ 8199.847601]  [<ffffffffa0b997ac>] zio_execute+0x9c/0x100 [zfs]
[ 8199.848126]  [<ffffffffa099b356>] taskq_thread+0x256/0x490 [spl]
[ 8199.848614]  [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20
[ 8199.849153]  [<ffffffffa099b100>] ? taskq_cancel_id+0x160/0x160 [spl]
[ 8199.849645]  [<ffffffff810a2eba>] kthread+0xea/0xf0
[ 8199.850127]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[ 8199.850764]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
[ 8199.851349]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[ 8199.851886] Code: 00 4d 8d be 08 04 00 00 4c 89 ff e8 35 6d b6 e0 4c 89 e7 49 c7 86 40 04 00 00 00 00 00 00 e8 62 2c b6 e0 4c 89 ff e8 ca 6e b6 e0 <41> 83 be 48 04 00 00 03 0f 84 74 01 00 00 4d 85 ed 0f 84 f4 05 
[ 8199.894246] RIP  [<ffffffffa0b9f4e6>] zio_done+0x486/0x1210 [zfs]
[ 8199.894803]  RSP <ffff8802a5a27d10>
[ 8199.895234] CR2: ffff8802886bee00

This is in

(gdb) l *(zio_done+0x486)
0xee4e6 is in zio_done (/usr/local/src/spl-0.7.0/include/sys/mutex.h:100).
95			lockdep_off();					\
96	}
97	static inline void
98	spl_mutex_lockdep_on_maybe(kmutex_t *mp)			\
99	{								\
100		if (mp && mp->m_type == MUTEX_NOLOCKDEP)		\
101			lockdep_on();					\
102	}
103	#else  /* CONFIG_LOCKDEP */
104	#define spl_mutex_set_type(mp, type)
(gdb)

So it seems the mutex itself is freed by the time we are trying to check it? This certainly is not great.

@verygreen
Copy link
Contributor Author

Hm, as I was writing that bugreport, also had a very related crash:

[10809.136280] BUG: unable to handle kernel paging request at ffff8802d7656e00
[10809.139337] IP: [<ffffffffa0bcb33c>] zio_wait_for_children+0xbc/0x170 [zfs]
[10809.143212] PGD 2e75067 PUD 33e9f9067 PMD 33e93d067 PTE 80000002d7656060
[10809.145772] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[10809.146692] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi i2c_piix4 drm ata_piix virtio_blk i2c_core virtio_balloon pcspkr serio_raw floppy virtio_console libata nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[10809.166853] CPU: 11 PID: 17000 Comm: mmp Tainted: P           OE  ------------   3.10.0-debug #2
[10809.168431] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[10809.168967] task: ffff88031d1ca140 ti: ffff8800aec4c000 task.ti: ffff8800aec4c000
[10809.169954] RIP: 0010:[<ffffffffa0bcb33c>]  [<ffffffffa0bcb33c>] zio_wait_for_children+0xbc/0x170 [zfs]
[10809.172351] RSP: 0018:ffff8800aec4fc88  EFLAGS: 00010246
[10809.172936] RAX: ffff8800aec4ffd8 RBX: ffff8802d76569b8 RCX: 0000000000000001
[10809.173485] RDX: 000000000000000c RSI: 0000000000000001 RDI: ffff8802d7656dc0
[10809.174113] RBP: ffff8800aec4fca8 R08: 0000000000000001 R09: 0000000000000001
[10809.174589] R10: 0000000000000000 R11: 0000000000015340 R12: ffff8802d7656d38
[10809.175227] R13: 0000000000000001 R14: ffff8802d7656dc0 R15: 0000000000000003
[10809.175732] FS:  0000000000000000(0000) GS:ffff88033e560000(0000) knlGS:0000000000000000
[10809.176840] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[10809.177389] CR2: ffff8802d7656e00 CR3: 0000000001c0e000 CR4: 00000000000006e0
[10809.178003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10809.178547] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10809.179323] Stack:
[10809.180203]  ffff8802d76569b8 0000000000000000 00000000ffffffff ffff8802f3fa4000
[10809.181226]  ffff8800aec4fd28 ffffffffa0bd20a3 ffff8800aec4fd28 ffffffffa0bd35ad
[10809.182264]  ffff8800aec4c000 ffff8800aec4fd38 ffff8800aec4fcf0 ffff88031d1ca140
[10809.183450] Call Trace:
[10809.184254]  [<ffffffffa0bd20a3>] zio_done+0x43/0x1210 [zfs]
[10809.184985]  [<ffffffffa0bd35ad>] ? zio_ready+0x33d/0x5a0 [zfs]
[10809.185678]  [<ffffffffa0bd04cc>] zio_nowait+0xbc/0x170 [zfs]
[10809.186505]  [<ffffffffa0b4b511>] mmp_thread+0x611/0xb60 [zfs]
[10809.187373]  [<ffffffffa0b4acc0>] ? metaslab_check_free+0x190/0x190 [zfs]
[10809.188204]  [<ffffffffa0b4af00>] ? mmp_write_done+0x240/0x240 [zfs]
[10809.188976]  [<ffffffffa0984091>] thread_generic_wrapper+0x71/0x80 [spl]
[10809.189664]  [<ffffffffa0984020>] ? __thread_exit+0x20/0x20 [spl]
[10809.190412]  [<ffffffff810a2eba>] kthread+0xea/0xf0
[10809.191724]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[10809.192482]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
[10809.193408]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[10809.194192] Code: 00 4c 8d b3 08 04 00 00 4c 89 f7 e8 df ae b3 e0 4c 89 e7 48 c7 83 40 04 00 00 00 00 00 00 e8 0c 6e b3 e0 4c 89 f7 e8 74 b0 b3 e0 <83> bb 48 04 00 00 03 74 13 5b 41 5c 44 89 e8 41 5d 41 5e 5d c3 
[10809.197216] RIP  [<ffffffffa0bcb33c>] zio_wait_for_children+0xbc/0x170 [zfs]
[10809.198043]  RSP <ffff8800aec4fc88>
[10809.198673] CR2: ffff8802d7656e00
(gdb) l *(zio_wait_for_children+0xbc)
0xe733c is in zio_wait_for_children (/usr/local/src/spl-0.7.0/include/sys/mutex.h:100).
95			lockdep_off();					\
96	}
97	static inline void
98	spl_mutex_lockdep_on_maybe(kmutex_t *mp)			\
99	{								\
100		if (mp && mp->m_type == MUTEX_NOLOCKDEP)		\
101			lockdep_on();					\
102	}
103	#else  /* CONFIG_LOCKDEP */
104	#define spl_mutex_set_type(mp, type)

So it looks like something is actively freeing a mutex as it's being used by the io thread?

@verygreen
Copy link
Contributor Author

verygreen commented Jul 26, 2017

and I had another one in the same mutex reference:

[36355.029844] BUG: unable to handle kernel paging request at ffff8802bced3ee0
[36355.031067] IP: [<ffffffffa0ac98b4>] dbuf_rele_and_unlock+0x134/0x5e0 [zfs]
[36355.031666] PGD 2e75067 PUD 33ebfa067 PMD 33ea12067 PTE 80000002bced3060
[36355.032165] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[36355.032656] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 ata_generic syscopyarea pata_acpi sysfillrect sysimgblt ttm drm_kms_helper ata_piix i2c_piix4 drm virtio_blk libata floppy serio_raw virtio_balloon pcspkr virtio_console i2c_core nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[36355.037013] CPU: 3 PID: 26415 Comm: dmu_objset_find Tainted: P           OE  ------------   3.10.0-debug #2
[36355.038018] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[36355.038557] task: ffff8802f13d65c0 ti: ffff880303cfc000 task.ti: ffff880303cfc000
[36355.039469] RIP: 0010:[<ffffffffa0ac98b4>]  [<ffffffffa0ac98b4>] dbuf_rele_and_unlock+0x134/0x5e0 [zfs]
[36355.040478] RSP: 0018:ffff880303cff868  EFLAGS: 00010246
[36355.041051] RAX: ffff880303cfffd8 RBX: ffff8802bced3dc0 RCX: ffff8802f13d6eb8
[36355.041737] RDX: 000000000000006c RSI: 0000000000000000 RDI: ffff8802bced3ea0
[36355.042561] RBP: ffff880303cff918 R08: 0000000000000001 R09: 0000000000000000
[36355.043314] R10: 0000000000000000 R11: ffff8802f13d6ee8 R12: ffff8802bced3ea0
[36355.044024] R13: ffff8802bced3e18 R14: 0000000000000000 R15: 0000000000000000
[36355.044709] FS:  0000000000000000(0000) GS:ffff88033e460000(0000) knlGS:0000000000000000
[36355.045963] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[36355.046658] CR2: ffff8802bced3ee0 CR3: 0000000001c0e000 CR4: 00000000000006e0
[36355.047363] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[36355.048085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[36355.050975] Stack:
[36355.051609]  ffff8802bced3e80 ffff8802bced3dc0 ffff8802f13d65c0 ffff8802bced3e18
[36355.053060]  0000000000000000 ffff880303cfffd8 ffff880303cff918 0000000000000246
[36355.054541]  ffffffffa0ac9f0c ffff880303cff920 ffff880303cfc000 ffffffff00000130
[36355.057218] Call Trace:
[36355.057970]  [<ffffffffa0ac9f0c>] ? dbuf_rele+0x2c/0x90 [zfs]
[36355.058678]  [<ffffffffa099adca>] ? spl_kmem_free+0x2a/0x40 [spl]
[36355.059352]  [<ffffffffa0ac9f30>] dbuf_rele+0x50/0x90 [zfs]
[36355.060083]  [<ffffffffa0ac9f7e>] dmu_buf_rele+0xe/0x10 [zfs]
[36355.060847]  [<ffffffffa0b60840>] zap_unlockdir+0x60/0xb0 [zfs]
[36355.061593]  [<ffffffffa0b609e0>] zap_lookup_norm+0xa0/0xd0 [zfs]
[36355.062315]  [<ffffffffa0b60a8b>] zap_contains+0x3b/0x60 [zfs]
[36355.064650]  [<ffffffffa0b09cf5>] dsl_prop_get_dd+0x135/0x240 [zfs]
[36355.065508]  [<ffffffffa0b09f76>] dsl_prop_get_ds+0x176/0x220 [zfs]
[36355.066237]  [<ffffffffa0b0a090>] dsl_prop_register+0x50/0x2b0 [zfs]
[36355.066921]  [<ffffffffa0ad9830>] ? secondary_cache_changed_cb+0x20/0x20 [zfs]
[36355.069952]  [<ffffffffa0adc2ce>] dmu_objset_open_impl+0x74e/0x920 [zfs]
[36355.070671]  [<ffffffffa0adc654>] dmu_objset_from_ds+0x1b4/0x320 [zfs]
[36355.071685]  [<ffffffffa0adc8bd>] dmu_objset_own_impl+0x2d/0xc0 [zfs]
[36355.072381]  [<ffffffffa0ade8a4>] dmu_objset_own_obj+0x54/0x80 [zfs]
[36355.073134]  [<ffffffffa0b99c88>] zil_claim+0x58/0x270 [zfs]
[36355.073985]  [<ffffffffa0adb7d0>] dmu_objset_find_dp_impl+0x210/0x4b0 [zfs]
[36355.074896]  [<ffffffffa0adba9a>] dmu_objset_find_dp_cb+0x2a/0x40 [zfs]
[36355.075605]  [<ffffffffa099e356>] taskq_thread+0x256/0x490 [spl]
[36355.076520]  [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20
[36355.077215]  [<ffffffffa099e100>] ? taskq_cancel_id+0x160/0x160 [spl]
[36355.077942]  [<ffffffff810a2eba>] kthread+0xea/0xf0
[36355.078607]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[36355.079372]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
[36355.080106]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[36355.085013] Code: 00 4c 8d a3 e0 00 00 00 4c 89 e7 e8 67 c9 c3 e0 4c 89 ef 48 c7 83 18 01 00 00 00 00 00 00 e8 94 88 c3 e0 4c 89 e7 e8 fc ca c3 e0 <83> bb 20 01 00 00 03 0f 84 a7 02 00 00 48 8b 45 d0 65 48 33 04 
[36355.087627] RIP  [<ffffffffa0ac98b4>] dbuf_rele_and_unlock+0x134/0x5e0 [zfs]
[36355.088419]  RSP <ffff880303cff868>
[36355.100086] CR2: ffff8802bced3ee0

this is again in the same spl_mutex_lockdep_on_maybe() place

@ironMann
Copy link
Contributor

ironMann commented Aug 1, 2017

This race happens when a mutex is used for refcounting. To prevent it, these two lines should be reversed (see the comment above).
Also, the race should not occur with openzfs/spl#587

@verygreen
Copy link
Contributor Author

Thanks! Reverting the two lines is an effective fix for me and looking at them the current way looks to be very incorrect indeed.
So I wonder why the fix is still not included in one form or another?
Use after free are nasty bugs that manifest themselves in mysterious ways if you are not specially equipped to catch them.

@behlendorf
Copy link
Contributor

So I wonder why the fix is still not included in one form or another?

It was unfortunately overlooked since it was part of a larger change. I think the right way to move forward here would be to get a minimal fix in to the 0.7 release branch and master. Then work on getting the larger change reviewed and merged to master.

@ironMann it would be great if you could open a PR with the minimal fix. It sounds like @verygreen can confirm it does resolve the issue as expected.

@tuxoko
Copy link
Contributor

tuxoko commented Aug 2, 2017

Flipping those to line will indeed fix this, though I think you need to flip the top two lines as well. Otherwise, the lock/unlock will become unbalanced in lockdep.

ironMann added a commit to ironMann/spl that referenced this issue Aug 2, 2017
Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.

Issue openzfs/zfs#6401

Signed-off-by: Gvozden Neskovic <[email protected]>
@ironMann
Copy link
Contributor

ironMann commented Aug 2, 2017

@verygreen kudos for finding this, also for the intro to debug_pagealloc. I would appreciate if you give the openzfs/spl#637 a try.

@tuxoko I believe you're right 👍

ironMann added a commit to ironMann/spl that referenced this issue Aug 2, 2017
Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.

Issue openzfs/zfs#6401

Signed-off-by: Gvozden Neskovic <[email protected]>
tonyhutter pushed a commit to openzfs/spl that referenced this issue Aug 3, 2017
Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Gvozden Neskovic <[email protected]>
Closes openzfs/zfs#6401
Closes #637
ofaaland pushed a commit to ofaaland/spl that referenced this issue Aug 3, 2017
Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.

Issue openzfs/zfs#6401

Signed-off-by: Gvozden Neskovic <[email protected]>
Fabian-Gruenbichler pushed a commit to Fabian-Gruenbichler/spl that referenced this issue Sep 6, 2017
Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Gvozden Neskovic <[email protected]>
Closes openzfs/zfs#6401
Closes openzfs#637
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants