Skip to content

Commit

Permalink
More speculative prefetcher improvements
Browse files Browse the repository at this point in the history
- Make prefetch distance adaptive: up to 4MB prefetch doubles for
every, hit same as before, but after that it grows by 1/8 every time
the prefetch read does not complete in time to satisfy the demand.
My tests show that 4MB is sufficient for wide NVMe pool to saturate
single reader thread at 2.5GB/s, while new 64MB maximum allows the
same thread to reach 1.5GB/s on wide HDD pool.  Further distance
increase may increase speed even more, but less dramatic and with
higher latency.

 - Allow early reuse of inactive prefetch streams: streams that never
saw hits can be reused immediately if there is a demand, while others
can be reused after 1s of inactivity, starting with the oldest.  After
2s of inactivity streams are deleted to free resources same as before.
This allows by several times increase strided read performance on HDD
pool in presence of simultaneous random reads, previously filling the
zfetch_max_streams limit for seconds and so blocking most of prefetch.

 - Always issue intermediate indirect block reads with SYNC priority.
Each of those reads if delayed for longer may delay up to 1024 other
block prefetches, that may be not good for wide pools.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored-By: iXsystems, Inc.
Closes openzfs#13452
  • Loading branch information
amotin authored and andrewc12 committed Sep 23, 2022
1 parent 7992cb3 commit 679064b
Show file tree
Hide file tree
Showing 5 changed files with 133 additions and 101 deletions.
2 changes: 1 addition & 1 deletion include/sys/dbuf.h
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ typedef struct dbuf_hash_table {
krwlock_t hash_rwlocks[DBUF_RWLOCKS] ____cacheline_aligned;
} dbuf_hash_table_t;

typedef void (*dbuf_prefetch_fn)(void *, boolean_t);
typedef void (*dbuf_prefetch_fn)(void *, uint64_t, uint64_t, boolean_t);

uint64_t dbuf_whichblock(const struct dnode *di, const int64_t level,
const uint64_t offset);
Expand Down
16 changes: 7 additions & 9 deletions include/sys/dmu_zfetch.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,20 +49,18 @@ typedef struct zfetch {

typedef struct zstream {
uint64_t zs_blkid; /* expect next access at this blkid */
uint64_t zs_pf_blkid1; /* first block to prefetch */
uint64_t zs_pf_blkid; /* block to prefetch up to */

/*
* We will next prefetch the L1 indirect block of this level-0
* block id.
*/
uint64_t zs_ipf_blkid1; /* first block to prefetch */
uint64_t zs_ipf_blkid; /* block to prefetch up to */
unsigned int zs_pf_dist; /* data prefetch distance in bytes */
unsigned int zs_ipf_dist; /* L1 prefetch distance in bytes */
uint64_t zs_pf_start; /* first data block to prefetch */
uint64_t zs_pf_end; /* data block to prefetch up to */
uint64_t zs_ipf_start; /* first data block to prefetch L1 */
uint64_t zs_ipf_end; /* data block to prefetch L1 up to */

list_node_t zs_node; /* link for zf_stream */
hrtime_t zs_atime; /* time last prefetch issued */
zfetch_t *zs_fetch; /* parent fetch */
boolean_t zs_missed; /* stream saw cache misses */
boolean_t zs_more; /* need more distant prefetch */
zfs_refcount_t zs_callers; /* number of pending callers */
/*
* Number of stream references: dnode, callers and pending blocks.
Expand Down
17 changes: 14 additions & 3 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -487,7 +487,15 @@ However, this is limited by
.It Sy zfetch_array_rd_sz Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq ulong
If prefetching is enabled, disable prefetching for reads larger than this size.
.
.It Sy zfetch_max_distance Ns = Ns Sy 8388608 Ns B Po 8 MiB Pc Pq uint
.It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint
Min bytes to prefetch per stream.
Prefetch distance starts from the demand access size and quickly grows to
this value, doubling on each hit.
After that it may grow further by 1/8 per hit, but only if some prefetch
since last time haven't completed in time to satisfy demand request, i.e.
prefetch depth didn't cover the read latency or the pool got saturated.
.
.It Sy zfetch_max_distance Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq uint
Max bytes to prefetch per stream.
.
.It Sy zfetch_max_idistance Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq uint
Expand All @@ -496,8 +504,11 @@ Max bytes to prefetch indirects for per stream.
.It Sy zfetch_max_streams Ns = Ns Sy 8 Pq uint
Max number of streams per zfetch (prefetch streams per file).
.
.It Sy zfetch_min_sec_reap Ns = Ns Sy 2 Pq uint
Min time before an active prefetch stream can be reclaimed
.It Sy zfetch_min_sec_reap Ns = Ns Sy 1 Pq uint
Min time before inactive prefetch stream can be reclaimed
.
.It Sy zfetch_max_sec_reap Ns = Ns Sy 2 Pq uint
Max time before inactive prefetch stream can be deleted
.
.It Sy zfs_abd_scatter_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
Enables ARC from using scatter/gather lists and forces all allocations to be
Expand Down
14 changes: 9 additions & 5 deletions module/zfs/dbuf.c
Original file line number Diff line number Diff line change
Expand Up @@ -3185,8 +3185,10 @@ typedef struct dbuf_prefetch_arg {
static void
dbuf_prefetch_fini(dbuf_prefetch_arg_t *dpa, boolean_t io_done)
{
if (dpa->dpa_cb != NULL)
dpa->dpa_cb(dpa->dpa_arg, io_done);
if (dpa->dpa_cb != NULL) {
dpa->dpa_cb(dpa->dpa_arg, dpa->dpa_zb.zb_level,
dpa->dpa_zb.zb_blkid, io_done);
}
kmem_free(dpa, sizeof (*dpa));
}

Expand Down Expand Up @@ -3320,7 +3322,8 @@ dbuf_prefetch_indirect_done(zio_t *zio, const zbookmark_phys_t *zb,
dpa->dpa_zb.zb_object, dpa->dpa_curlevel, nextblkid);

(void) arc_read(dpa->dpa_zio, dpa->dpa_spa,
bp, dbuf_prefetch_indirect_done, dpa, dpa->dpa_prio,
bp, dbuf_prefetch_indirect_done, dpa,
ZIO_PRIORITY_SYNC_READ,
ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE,
&iter_aflags, &zb);
}
Expand Down Expand Up @@ -3455,7 +3458,8 @@ dbuf_prefetch_impl(dnode_t *dn, int64_t level, uint64_t blkid,
SET_BOOKMARK(&zb, ds != NULL ? ds->ds_object : DMU_META_OBJSET,
dn->dn_object, curlevel, curblkid);
(void) arc_read(dpa->dpa_zio, dpa->dpa_spa,
&bp, dbuf_prefetch_indirect_done, dpa, prio,
&bp, dbuf_prefetch_indirect_done, dpa,
ZIO_PRIORITY_SYNC_READ,
ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE,
&iter_aflags, &zb);
}
Expand All @@ -3467,7 +3471,7 @@ dbuf_prefetch_impl(dnode_t *dn, int64_t level, uint64_t blkid,
return (1);
no_issue:
if (cb != NULL)
cb(arg, B_FALSE);
cb(arg, level, blkid, B_FALSE);
return (0);
}

Expand Down
Loading

0 comments on commit 679064b

Please sign in to comment.