Metaslab max_size should be persisted while unloaded #9055

pcd1193182 · 2019-07-17T16:12:44Z

Reviewed-by: Matt Ahrens [email protected]
Reviewed-by: Sebastien Roy [email protected]
Reviewed-by: Serapheim Dimitropoulos [email protected]

Motivation and Context

When we unload metaslabs today in ZFS, the cached max_size value is discarded. We instead use the histogram to determine whether or not we think we can satisfy an allocation from the metaslab. This can result in situations where, if we're doing I/Os of a size not aligned to a histogram bucket, a metaslab is loaded even though it cannot satisfy the allocation we think it can. For example, a metaslab with 16 entries in the 16k-32k bucket may have entirely 16kb entries. If we try to allocate a 24kb buffer, we will load that metaslab because we think it should be able to handle the allocation. Doing so is expensive in CPU time, disk reads, and average IO latency. This is exacerbated if the write being attempted is a sync write.

Description

We cache the max_size after the metaslab is unloaded. If we ever get a free (or a coalesced group of frees) larger than the max_size, we will update it. Otherwise, we leave it as is. When attempting to allocate, we use the max_size as a lower bound, and respect it unless we are in try_hard. However, we do age the max_size out at some point, since we expect the actual max_size to increase as we do more frees. A more sophisticated algorithm here might be helpful, but this works reasonably well.

How Has This Been Tested?

Extensive zfs-test suite and zloop testing has been run. In terms of performance testing, tests ran on Illumos. Under normal circumstances with a random read/write workload, the change produced a small (~5%) performance increase. In situations with heavy fragmentation, performance increased on the order of 30%. Metaslab loads resulting in failed allocation attempts drop to approximately 0 with the change in place.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

c0d3z3r0 · 2019-07-19T16:50:23Z

conflicts with #9045

@@ -2519,9 +2594,16 @@ metaslab_segment_weight(metaslab_t *msp)
  * weights we rely on the entire weight (excluding the weight-type bit).
  */
 boolean_t
-metaslab_should_allocate(metaslab_t *msp, uint64_t asize)
+metaslab_should_allocate(metaslab_t *msp, uint64_t asize, boolean_t try_hard)
 {
-       if (msp->ms_loaded) {
+       /*
+        * If the metaslab is loaded, ms_max_size is definitive and we can use
+        * the fast check. If it's not, the ms_max_size is a lower bound (once
+        * set), and we should use the fast check unless we're in try_hard.
+        */
+       if (msp->ms_loaded ||
+           (msp->ms_max_size != 0 && !try_hard &&
+           gethrtime() < msp->ms_unload_time + SEC2NSEC(max_size_cache_sec)))
                return (msp->ms_max_size >= asize);
        } else {
                ASSERT0(msp->ms_max_size);

behlendorf · 2019-07-19T18:18:21Z

@c0d3z3r0 that's right. Since #9045 is ready to go I'll get it merged so this PR can be rebased.

module/zfs/metaslab.c

codecov · 2019-07-27T03:53:21Z

Codecov Report

Merging #9055 into master will increase coverage by 7.94%.
The diff coverage is 96.55%.

@@            Coverage Diff             @@
##           master    #9055      +/-   ##
==========================================
+ Coverage   70.83%   78.77%   +7.94%     
==========================================
  Files         351      400      +49     
  Lines      115431   121049    +5618     
==========================================
+ Hits        81760    95352   +13592     
+ Misses      33671    25697    -7974

Flag	Coverage Δ
#kernel	`79.51% <98.24%> (-0.14%)`	⬇️
#user	`66.51% <94.82%> (+10.53%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adf495e...87e3760. Read the comment docs.

Signed-off-by: Paul Dagnelie <[email protected]>

When we unload metaslabs today in ZFS, the cached max_size value is discarded. We instead use the histogram to determine whether or not we think we can satisfy an allocation from the metaslab. This can result in situations where, if we're doing I/Os of a size not aligned to a histogram bucket, a metaslab is loaded even though it cannot satisfy the allocation we think it can. For example, a metaslab with 16 entries in the 16k-32k bucket may have entirely 16kB entries. If we try to allocate a 24kB buffer, we will load that metaslab because we think it should be able to handle the allocation. Doing so is expensive in CPU time, disk reads, and average IO latency. This is exacerbated if the write being attempted is a sync write. This change makes ZFS cache the max_size after the metaslab is unloaded. If we ever get a free (or a coalesced group of frees) larger than the max_size, we will update it. Otherwise, we leave it as is. When attempting to allocate, we use the max_size as a lower bound, and respect it unless we are in try_hard. However, we do age the max_size out at some point, since we expect the actual max_size to increase as we do more frees. A more sophisticated algorithm here might be helpful, but this works reasonably well. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#9055

behlendorf added the Status: Code Review Needed Ready for review and testing label Jul 17, 2019

openzfs deleted a comment from codecov bot Jul 24, 2019

ahrens reviewed Jul 24, 2019

View reviewed changes

module/zfs/metaslab.c Outdated Show resolved Hide resolved

Metaslab max_size should be persisted while unloaded

d229838

pcd1193182 force-pushed the persist branch from ed1aae1 to 18229a2 Compare July 25, 2019 19:27

behlendorf reviewed Jul 25, 2019

View reviewed changes

module/zfs/metaslab.c Outdated Show resolved Hide resolved

module/zfs/metaslab.c Outdated Show resolved Hide resolved

pcd1193182 force-pushed the persist branch from 18229a2 to d4a2c7c Compare July 26, 2019 19:18

address feedback

87e3760

Signed-off-by: Paul Dagnelie <[email protected]>

pcd1193182 force-pushed the persist branch from d4a2c7c to 87e3760 Compare July 29, 2019 16:22

behlendorf approved these changes Jul 30, 2019

View reviewed changes

ahrens approved these changes Aug 2, 2019

View reviewed changes

pcd1193182 added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 2, 2019

ahrens merged commit c81f179 into openzfs:master Aug 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metaslab max_size should be persisted while unloaded #9055

Metaslab max_size should be persisted while unloaded #9055

pcd1193182 commented Jul 17, 2019

c0d3z3r0 commented Jul 19, 2019 •

edited

Loading

behlendorf commented Jul 19, 2019

codecov bot commented Jul 27, 2019 •

edited

Loading

Metaslab max_size should be persisted while unloaded #9055

Metaslab max_size should be persisted while unloaded #9055

Conversation

pcd1193182 commented Jul 17, 2019

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

c0d3z3r0 commented Jul 19, 2019 • edited Loading

behlendorf commented Jul 19, 2019

codecov bot commented Jul 27, 2019 • edited Loading

Codecov Report

c0d3z3r0 commented Jul 19, 2019 •

edited

Loading

codecov bot commented Jul 27, 2019 •

edited

Loading