Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

looping in metaslab_block_picker impacts performance on fragmented pools #8877

Merged
merged 1 commit into from
Jun 13, 2019

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Jun 10, 2019

Motivation and Context

On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

Description

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

External-issue: DLPX-62324

How Has This Been Tested?

Mostly on illumos. Used at Delphix since February 2019.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

External-issue: DLPX-62324
Signed-off-by: Matthew Ahrens <[email protected]>
@ahrens ahrens added Type: Performance Performance improvement or performance problem Status: Code Review Needed Ready for review and testing labels Jun 10, 2019
@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #8877 into master will increase coverage by 0.08%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8877      +/-   ##
==========================================
+ Coverage   78.65%   78.73%   +0.08%     
==========================================
  Files         382      382              
  Lines      117778   117782       +4     
==========================================
+ Hits        92638    92737      +99     
+ Misses      25140    25045      -95
Flag Coverage Δ
#kernel 79.4% <92.85%> (+0.04%) ⬆️
#user 67.4% <93.75%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 893a6d6...0e2daaa. Read the comment docs.

@ahrens
Copy link
Member Author

ahrens commented Jun 12, 2019

Reviewed at Delphix by @pcd1193182 @tonynguien @grwilson @sdimitro

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 12, 2019
@behlendorf behlendorf merged commit d3230d7 into openzfs:master Jun 13, 2019
@ahrens ahrens deleted the DLPX-62324 branch June 13, 2019 20:07
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 24, 2019
On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-62324
Closes openzfs#8877
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-62324
Closes #8877
allanjude pushed a commit to KlaraSystems/zfs that referenced this pull request Apr 28, 2020
On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-62324
Closes openzfs#8877
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants