Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaslab max_size should be persisted while unloaded #9055

Merged
merged 2 commits into from
Aug 5, 2019

Conversation

pcd1193182
Copy link
Contributor

Reviewed-by: Matt Ahrens [email protected]
Reviewed-by: Sebastien Roy [email protected]
Reviewed-by: Serapheim Dimitropoulos [email protected]

Motivation and Context

When we unload metaslabs today in ZFS, the cached max_size value is discarded. We instead use the histogram to determine whether or not we think we can satisfy an allocation from the metaslab. This can result in situations where, if we're doing I/Os of a size not aligned to a histogram bucket, a metaslab is loaded even though it cannot satisfy the allocation we think it can. For example, a metaslab with 16 entries in the 16k-32k bucket may have entirely 16kb entries. If we try to allocate a 24kb buffer, we will load that metaslab because we think it should be able to handle the allocation. Doing so is expensive in CPU time, disk reads, and average IO latency. This is exacerbated if the write being attempted is a sync write.

Description

We cache the max_size after the metaslab is unloaded. If we ever get a free (or a coalesced group of frees) larger than the max_size, we will update it. Otherwise, we leave it as is. When attempting to allocate, we use the max_size as a lower bound, and respect it unless we are in try_hard. However, we do age the max_size out at some point, since we expect the actual max_size to increase as we do more frees. A more sophisticated algorithm here might be helpful, but this works reasonably well.

How Has This Been Tested?

Extensive zfs-test suite and zloop testing has been run. In terms of performance testing, tests ran on Illumos. Under normal circumstances with a random read/write workload, the change produced a small (~5%) performance increase. In situations with heavy fragmentation, performance increased on the order of 30%. Metaslab loads resulting in failed allocation attempts drop to approximately 0 with the change in place.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jul 17, 2019
@c0d3z3r0
Copy link
Contributor

c0d3z3r0 commented Jul 19, 2019

conflicts with #9045

@@ -2519,9 +2594,16 @@ metaslab_segment_weight(metaslab_t *msp)
  * weights we rely on the entire weight (excluding the weight-type bit).
  */
 boolean_t
-metaslab_should_allocate(metaslab_t *msp, uint64_t asize)
+metaslab_should_allocate(metaslab_t *msp, uint64_t asize, boolean_t try_hard)
 {
-       if (msp->ms_loaded) {
+       /*
+        * If the metaslab is loaded, ms_max_size is definitive and we can use
+        * the fast check. If it's not, the ms_max_size is a lower bound (once
+        * set), and we should use the fast check unless we're in try_hard.
+        */
+       if (msp->ms_loaded ||
+           (msp->ms_max_size != 0 && !try_hard &&
+           gethrtime() < msp->ms_unload_time + SEC2NSEC(max_size_cache_sec)))
                return (msp->ms_max_size >= asize);
        } else {
                ASSERT0(msp->ms_max_size);

@behlendorf
Copy link
Contributor

@c0d3z3r0 that's right. Since #9045 is ready to go I'll get it merged so this PR can be rebased.

@openzfs openzfs deleted a comment from codecov bot Jul 24, 2019
module/zfs/metaslab.c Outdated Show resolved Hide resolved
module/zfs/metaslab.c Outdated Show resolved Hide resolved
module/zfs/metaslab.c Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 27, 2019

Codecov Report

Merging #9055 into master will increase coverage by 7.94%.
The diff coverage is 96.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9055      +/-   ##
==========================================
+ Coverage   70.83%   78.77%   +7.94%     
==========================================
  Files         351      400      +49     
  Lines      115431   121049    +5618     
==========================================
+ Hits        81760    95352   +13592     
+ Misses      33671    25697    -7974
Flag Coverage Δ
#kernel 79.51% <98.24%> (-0.14%) ⬇️
#user 66.51% <94.82%> (+10.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adf495e...87e3760. Read the comment docs.

Signed-off-by: Paul Dagnelie <[email protected]>
@pcd1193182 pcd1193182 added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 2, 2019
@ahrens ahrens merged commit c81f179 into openzfs:master Aug 5, 2019
allanjude pushed a commit to allanjude/zfs that referenced this pull request Aug 12, 2019
When we unload metaslabs today in ZFS, the cached max_size value is
discarded. We instead use the histogram to determine whether or not we
think we can satisfy an allocation from the metaslab. This can result in
situations where, if we're doing I/Os of a size not aligned to a
histogram bucket, a metaslab is loaded even though it cannot satisfy the
allocation we think it can. For example, a metaslab with 16 entries in
the 16k-32k bucket may have entirely 16kB entries. If we try to allocate
a 24kB buffer, we will load that metaslab because we think it should be
able to handle the allocation. Doing so is expensive in CPU time, disk
reads, and average IO latency. This is exacerbated if the write being
attempted is a sync write.

This change makes ZFS cache the max_size after the metaslab is
unloaded. If we ever get a free (or a coalesced group of frees) larger
than the max_size, we will update it. Otherwise, we leave it as is. When
attempting to allocate, we use the max_size as a lower bound, and
respect it unless we are in try_hard. However, we do age the max_size
out at some point, since we expect the actual max_size to increase as we
do more frees. A more sophisticated algorithm here might be helpful, but
this works reasonably well.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes openzfs#9055
allanjude pushed a commit to allanjude/zfs that referenced this pull request Oct 13, 2019
When we unload metaslabs today in ZFS, the cached max_size value is
discarded. We instead use the histogram to determine whether or not we
think we can satisfy an allocation from the metaslab. This can result in
situations where, if we're doing I/Os of a size not aligned to a
histogram bucket, a metaslab is loaded even though it cannot satisfy the
allocation we think it can. For example, a metaslab with 16 entries in
the 16k-32k bucket may have entirely 16kB entries. If we try to allocate
a 24kB buffer, we will load that metaslab because we think it should be
able to handle the allocation. Doing so is expensive in CPU time, disk
reads, and average IO latency. This is exacerbated if the write being
attempted is a sync write.

This change makes ZFS cache the max_size after the metaslab is
unloaded. If we ever get a free (or a coalesced group of frees) larger
than the max_size, we will update it. Otherwise, we leave it as is. When
attempting to allocate, we use the max_size as a lower bound, and
respect it unless we are in try_hard. However, we do age the max_size
out at some point, since we expect the actual max_size to increase as we
do more frees. A more sophisticated algorithm here might be helpful, but
this works reasonably well.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes openzfs#9055
allanjude pushed a commit to KlaraSystems/zfs that referenced this pull request Apr 28, 2020
When we unload metaslabs today in ZFS, the cached max_size value is
discarded. We instead use the histogram to determine whether or not we
think we can satisfy an allocation from the metaslab. This can result in
situations where, if we're doing I/Os of a size not aligned to a
histogram bucket, a metaslab is loaded even though it cannot satisfy the
allocation we think it can. For example, a metaslab with 16 entries in
the 16k-32k bucket may have entirely 16kB entries. If we try to allocate
a 24kB buffer, we will load that metaslab because we think it should be
able to handle the allocation. Doing so is expensive in CPU time, disk
reads, and average IO latency. This is exacerbated if the write being
attempted is a sync write.

This change makes ZFS cache the max_size after the metaslab is
unloaded. If we ever get a free (or a coalesced group of frees) larger
than the max_size, we will update it. Otherwise, we leave it as is. When
attempting to allocate, we use the max_size as a lower bound, and
respect it unless we are in try_hard. However, we do age the max_size
out at some point, since we expect the actual max_size to increase as we
do more frees. A more sophisticated algorithm here might be helpful, but
this works reasonably well.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes openzfs#9055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants