-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
looping in metaslab_block_picker impacts performance on fragmented pools #8877
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On fragmented pools with high-performance storage, the looping in metaslab_block_picker() can become the performance-limiting bottleneck. When looking for a larger block (e.g. a 128K block for the ZIL), we may search through many free segments (up to hundreds of thousands) to find one that is large enough to satisfy the allocation. This can take a long time (up to dozens of ms), and is done while holding the ms_lock, which other threads may spin waiting for. When this performance problem is encountered, profiling will show high CPU time in metaslab_block_picker, as well as in mutex_enter from various callers. The problem is very evident on a test system with a sync write workload with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage, 84% full and 88% fragmented. It has also been observed on production systems with 90TB of storage, 76% full and 87% fragmented. The fix is to change metaslab_df_alloc() to search only up to 16MB from the previous allocation (of this alignment). After that, we will pick a segment that is of the exact size requested (or larger). This reduces the number of iterations to a few hundred on fragmented pools (a ~100x improvement). External-issue: DLPX-62324 Signed-off-by: Matthew Ahrens <[email protected]>
ahrens
added
Type: Performance
Performance improvement or performance problem
Status: Code Review Needed
Ready for review and testing
labels
Jun 10, 2019
Codecov Report
@@ Coverage Diff @@
## master #8877 +/- ##
==========================================
+ Coverage 78.65% 78.73% +0.08%
==========================================
Files 382 382
Lines 117778 117782 +4
==========================================
+ Hits 92638 92737 +99
+ Misses 25140 25045 -95
Continue to review full report at Codecov.
|
Reviewed at Delphix by @pcd1193182 @tonynguien @grwilson @sdimitro |
behlendorf
approved these changes
Jun 12, 2019
behlendorf
added
Status: Accepted
Ready to integrate (reviewed, tested)
and removed
Status: Code Review Needed
Ready for review and testing
labels
Jun 12, 2019
tonyhutter
pushed a commit
to tonyhutter/zfs
that referenced
this pull request
Dec 24, 2019
On fragmented pools with high-performance storage, the looping in metaslab_block_picker() can become the performance-limiting bottleneck. When looking for a larger block (e.g. a 128K block for the ZIL), we may search through many free segments (up to hundreds of thousands) to find one that is large enough to satisfy the allocation. This can take a long time (up to dozens of ms), and is done while holding the ms_lock, which other threads may spin waiting for. When this performance problem is encountered, profiling will show high CPU time in metaslab_block_picker, as well as in mutex_enter from various callers. The problem is very evident on a test system with a sync write workload with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage, 84% full and 88% fragmented. It has also been observed on production systems with 90TB of storage, 76% full and 87% fragmented. The fix is to change metaslab_df_alloc() to search only up to 16MB from the previous allocation (of this alignment). After that, we will pick a segment that is of the exact size requested (or larger). This reduces the number of iterations to a few hundred on fragmented pools (a ~100x improvement). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-62324 Closes openzfs#8877
tonyhutter
pushed a commit
that referenced
this pull request
Jan 23, 2020
On fragmented pools with high-performance storage, the looping in metaslab_block_picker() can become the performance-limiting bottleneck. When looking for a larger block (e.g. a 128K block for the ZIL), we may search through many free segments (up to hundreds of thousands) to find one that is large enough to satisfy the allocation. This can take a long time (up to dozens of ms), and is done while holding the ms_lock, which other threads may spin waiting for. When this performance problem is encountered, profiling will show high CPU time in metaslab_block_picker, as well as in mutex_enter from various callers. The problem is very evident on a test system with a sync write workload with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage, 84% full and 88% fragmented. It has also been observed on production systems with 90TB of storage, 76% full and 87% fragmented. The fix is to change metaslab_df_alloc() to search only up to 16MB from the previous allocation (of this alignment). After that, we will pick a segment that is of the exact size requested (or larger). This reduces the number of iterations to a few hundred on fragmented pools (a ~100x improvement). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-62324 Closes #8877
allanjude
pushed a commit
to KlaraSystems/zfs
that referenced
this pull request
Apr 28, 2020
On fragmented pools with high-performance storage, the looping in metaslab_block_picker() can become the performance-limiting bottleneck. When looking for a larger block (e.g. a 128K block for the ZIL), we may search through many free segments (up to hundreds of thousands) to find one that is large enough to satisfy the allocation. This can take a long time (up to dozens of ms), and is done while holding the ms_lock, which other threads may spin waiting for. When this performance problem is encountered, profiling will show high CPU time in metaslab_block_picker, as well as in mutex_enter from various callers. The problem is very evident on a test system with a sync write workload with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage, 84% full and 88% fragmented. It has also been observed on production systems with 90TB of storage, 76% full and 87% fragmented. The fix is to change metaslab_df_alloc() to search only up to 16MB from the previous allocation (of this alignment). After that, we will pick a segment that is of the exact size requested (or larger). This reduces the number of iterations to a few hundred on fragmented pools (a ~100x improvement). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> External-issue: DLPX-62324 Closes openzfs#8877
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Status: Accepted
Ready to integrate (reviewed, tested)
Type: Performance
Performance improvement or performance problem
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.
When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.
The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.
Description
The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).
External-issue: DLPX-62324
How Has This Been Tested?
Mostly on illumos. Used at Delphix since February 2019.
Types of changes
Checklist:
Signed-off-by
.