Skip to content

Commit

Permalink
Fixing race condition in dmu_read_abd()
Browse files Browse the repository at this point in the history
It is important to hold the dbuf mutex (db_mtx) when creating ZIO's in
dmu_read_abd(). The BP that is returned dmu_buf_get_gp_from_dbuf() may
come from a previous direct IO write. In this case, it is attached to a
dirty record in the dbuf. When zio_read() is called, a copy of the BP is
made through io_bp_copy to io_bp in zio_create(). Without holding the
db_mtx though, the dirty record may be freed in dbuf_read_done(). This
can result in garbage beening place BP for the ZIO creatd through
zio_read(). By holding the db_mtx, this race can be avoided. Below is a
stack trace of the issue that was occuring in vdev_mirror_child_select()
without holding the db_mtx and creating the the ZIO.

[29882.427056] VERIFY(zio->io_bp == NULL ||
BP_PHYSICAL_BIRTH(zio->io_bp) == txg) failed
[29882.434884] PANIC at vdev_mirror.c:545:vdev_mirror_child_select()
[29882.440976] Showing stack for process 1865540
[29882.445336] CPU: 57 PID: 1865540 Comm: fio Kdump: loaded Tainted: P
OE    --------- -  - 4.18.0-408.el8.x86_64 openzfs#1
[29882.456457] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21
10/08/2020
[29882.463844] Call Trace:
[29882.466296]  dump_stack+0x41/0x60
[29882.469618]  spl_panic+0xd0/0xe8 [spl]
[29882.473376]  ? __dprintf+0x10e/0x180 [zfs]
[29882.477674]  ? kfree+0xd3/0x250
[29882.480819]  ? __dprintf+0x10e/0x180 [zfs]
[29882.485103]  ? vdev_mirror_map_alloc+0x29/0x50 [zfs]
[29882.490250]  ? vdev_lookup_top+0x20/0x90 [zfs]
[29882.494878]  spl_assert+0x17/0x20 [zfs]
[29882.498893]  vdev_mirror_child_select+0x279/0x300 [zfs]
[29882.504289]  vdev_mirror_io_start+0x11f/0x2b0 [zfs]
[29882.509336]  zio_vdev_io_start+0x3ee/0x520 [zfs]
[29882.514137]  zio_nowait+0x105/0x290 [zfs]
[29882.518330]  dmu_read_abd+0x196/0x460 [zfs]
[29882.522691]  dmu_read_uio_direct+0x6d/0xf0 [zfs]
[29882.527472]  dmu_read_uio_dnode+0x12a/0x140 [zfs]
[29882.532345]  dmu_read_uio_dbuf+0x3f/0x60 [zfs]
[29882.536953]  zfs_read+0x238/0x3f0 [zfs]
[29882.540976]  zpl_iter_read_direct+0xe0/0x180 [zfs]
[29882.545952]  ? rrw_exit+0xc6/0x200 [zfs]
[29882.550058]  zpl_iter_read+0x90/0xb0 [zfs]
[29882.554340]  new_sync_read+0x10f/0x150
[29882.558094]  vfs_read+0x91/0x140
[29882.561325]  ksys_read+0x4f/0xb0
[29882.564557]  do_syscall_64+0x5b/0x1a0
[29882.568222]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[29882.573267] RIP: 0033:0x7f7fe0fa6ab4

Signed-off-by: Brian Atkinson <[email protected]>
  • Loading branch information
bwatkinson committed Sep 12, 2024
1 parent 5c27ab3 commit d697eeb
Showing 1 changed file with 16 additions and 3 deletions.
19 changes: 16 additions & 3 deletions module/zfs/dmu_direct.c
Original file line number Diff line number Diff line change
Expand Up @@ -349,12 +349,25 @@ dmu_read_abd(dnode_t *dn, uint64_t offset, uint64_t size,

mbuf = make_abd_for_dbuf(db, data, offset, size);
ASSERT3P(mbuf, !=, NULL);

/*
* The dbuf mutex (db_mtx) must be held when creating the ZIO
* for the read. The BP returned from
* dmu_buf_get_bp_from_dbuf() could be from a previous Direct
* I/O write that is in the dbuf's dirty record. When
* zio_read() is called, zio_create() will make a copy of the
* BP. However, if zio_read() is called without the mutex
* being held then the dirty record from the dbuf could be
* freed in dbuf_write_done() resulting in garbage being set
* for the zio BP.
*/
zio_t *cio = zio_read(rio, spa, bp, mbuf, db->db.db_size,
dmu_read_abd_done, NULL, ZIO_PRIORITY_SYNC_READ,
ZIO_FLAG_CANFAIL, &zb);
mutex_exit(&db->db_mtx);

zfs_racct_read(spa, db->db.db_size, 1, flags);
zio_nowait(zio_read(rio, spa, bp, mbuf, db->db.db_size,
dmu_read_abd_done, NULL, ZIO_PRIORITY_SYNC_READ,
ZIO_FLAG_CANFAIL, &zb));
zio_nowait(cio);
}

dmu_buf_rele_array(dbp, numbufs, FTAG);
Expand Down

0 comments on commit d697eeb

Please sign in to comment.