Concurrent small allocation defeats large allocation #8843

pcd1193182 · 2019-06-01T00:07:08Z

Motivation and Context

With the new parallel allocators scheme, there is a possibility for a problem where two threads, allocating from the same allocator at the same time, conflict with each other. There are two primary cases to worry about. First, another thread working on another allocator activates the same metaslab that the first thread was trying to activate. This results in the first thread needing to go back and reselect a new metaslab, even though it may have waited a long time for this metaslab to load. Second, another thread working on the same allocator may have activated a different metaslab while the first thread was waiting for its metaslab to load. Both of these cases can cause the first thread to be significantly delayed in issuing its IOs. The second case can also cause metaslab load/unload churn; because the metaslab is loaded but not fully activated, we never set the selected_txg, which results in the metaslab being immediately unloaded again. This process can repeat many times, wasting disk and cpu resources. This is more likely to happen when the IO of the first thread is a larger one (like a ZIL write) and the other thread is doing a smaller write, because it is more likely to find an acceptable metaslab quickly.

Description

There are two primary changes. The first is to always proceed with the allocation when returning from metaslab_activate if we were preempted in either of the ways described in the previous section. The second change is to set the selected_txg before we do the call to activate so that even if the metaslab is not used for an allocation, we won't immediately attempt to unload it.

How Has This Been Tested?

Passes the zfs-test suite and zloop, and has been performance tested extensively on Illumos, where it resolved a number of performance anomalies.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

ahrens · 2019-06-02T04:05:32Z

FYI, this is a port of openzfs/openzfs#732

behlendorf · 2019-06-05T00:05:05Z

@pcd1193182 it looks like CI didn't get notified of this PR. Would you mind force updating the PR to force it to run.

module/zfs/metaslab.c

Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-61314

codecov · 2019-06-22T02:03:48Z

Codecov Report

Merging #8843 into master will increase coverage by 0.13%.
The diff coverage is 81.94%.

@@            Coverage Diff             @@
##           master    #8843      +/-   ##
==========================================
+ Coverage   78.48%   78.62%   +0.13%     
==========================================
  Files         388      388              
  Lines      120013   120064      +51     
==========================================
+ Hits        94197    94405     +208     
+ Misses      25816    25659     -157

Flag	Coverage Δ
#kernel	`79.46% <81.42%> (ø)`	⬆️
#user	`66.17% <81.69%> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a370182...2e1152c. Read the comment docs.

module/zfs/metaslab.c

With the new parallel allocators scheme, there is a possibility for a problem where two threads, allocating from the same allocator at the same time, conflict with each other. There are two primary cases to worry about. First, another thread working on another allocator activates the same metaslab that the first thread was trying to activate. This results in the first thread needing to go back and reselect a new metaslab, even though it may have waited a long time for this metaslab to load. Second, another thread working on the same allocator may have activated a different metaslab while the first thread was waiting for its metaslab to load. Both of these cases can cause the first thread to be significantly delayed in issuing its IOs. The second case can also cause metaslab load/unload churn; because the metaslab is loaded but not fully activated, we never set the selected_txg, which results in the metaslab being immediately unloaded again. This process can repeat many times, wasting disk and cpu resources. This is more likely to happen when the IO of the first thread is a larger one (like a ZIL write) and the other thread is doing a smaller write, because it is more likely to find an acceptable metaslab quickly. There are two primary changes. The first is to always proceed with the allocation when returning from metaslab_activate if we were preempted in either of the ways described in the previous section. The second change is to set the selected_txg before we do the call to activate so that even if the metaslab is not used for an allocation, we won't immediately attempt to unload it. Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-61314 Closes openzfs#8843

With the new parallel allocators scheme, there is a possibility for a problem where two threads, allocating from the same allocator at the same time, conflict with each other. There are two primary cases to worry about. First, another thread working on another allocator activates the same metaslab that the first thread was trying to activate. This results in the first thread needing to go back and reselect a new metaslab, even though it may have waited a long time for this metaslab to load. Second, another thread working on the same allocator may have activated a different metaslab while the first thread was waiting for its metaslab to load. Both of these cases can cause the first thread to be significantly delayed in issuing its IOs. The second case can also cause metaslab load/unload churn; because the metaslab is loaded but not fully activated, we never set the selected_txg, which results in the metaslab being immediately unloaded again. This process can repeat many times, wasting disk and cpu resources. This is more likely to happen when the IO of the first thread is a larger one (like a ZIL write) and the other thread is doing a smaller write, because it is more likely to find an acceptable metaslab quickly. There are two primary changes. The first is to always proceed with the allocation when returning from metaslab_activate if we were preempted in either of the ways described in the previous section. The second change is to set the selected_txg before we do the call to activate so that even if the metaslab is not used for an allocation, we won't immediately attempt to unload it. Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-61314 Closes #8843

With the new parallel allocators scheme, there is a possibility for a problem where two threads, allocating from the same allocator at the same time, conflict with each other. There are two primary cases to worry about. First, another thread working on another allocator activates the same metaslab that the first thread was trying to activate. This results in the first thread needing to go back and reselect a new metaslab, even though it may have waited a long time for this metaslab to load. Second, another thread working on the same allocator may have activated a different metaslab while the first thread was waiting for its metaslab to load. Both of these cases can cause the first thread to be significantly delayed in issuing its IOs. The second case can also cause metaslab load/unload churn; because the metaslab is loaded but not fully activated, we never set the selected_txg, which results in the metaslab being immediately unloaded again. This process can repeat many times, wasting disk and cpu resources. This is more likely to happen when the IO of the first thread is a larger one (like a ZIL write) and the other thread is doing a smaller write, because it is more likely to find an acceptable metaslab quickly. There are two primary changes. The first is to always proceed with the allocation when returning from metaslab_activate if we were preempted in either of the ways described in the previous section. The second change is to set the selected_txg before we do the call to activate so that even if the metaslab is not used for an allocation, we won't immediately attempt to unload it. Reviewed by: Jerry Jelinek <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-61314 Closes openzfs#8843 Signed-off-by: Bryant G. Ly <[email protected]> Conflicts: module/zfs/metaslab.c

behlendorf added the Status: Code Review Needed Ready for review and testing label Jun 5, 2019

pcd1193182 force-pushed the concurrent branch from 8e54b09 to 8e0e658 Compare June 5, 2019 17:53

behlendorf requested changes Jun 7, 2019

View reviewed changes

module/zfs/metaslab.c Show resolved Hide resolved

module/zfs/metaslab.c Show resolved Hide resolved

behlendorf added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Code Review Needed Ready for review and testing labels Jun 13, 2019

concurrent small allocation defeats large allocation

58db050

Signed-off-by: Paul Dagnelie <[email protected]> External-issue: DLPX-61314

pcd1193182 force-pushed the concurrent branch from 8e0e658 to 58db050 Compare June 21, 2019 17:15

no tracepoints for now

6791cdd

behlendorf approved these changes Jun 24, 2019

View reviewed changes

module/zfs/metaslab.c Outdated Show resolved Hide resolved

behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Revision Needed Changes are required for the PR to be accepted labels Jun 24, 2019

disable code differently

2e1152c

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 26, 2019

behlendorf merged commit 679b0f2 into openzfs:master Jun 26, 2019

datacore-yash mentioned this pull request Nov 30, 2020

Concurrent small allocation defeats large allocation openzfsonwindows/ZFSin#319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent small allocation defeats large allocation #8843

Concurrent small allocation defeats large allocation #8843

pcd1193182 commented Jun 1, 2019

ahrens commented Jun 2, 2019

behlendorf commented Jun 5, 2019

codecov bot commented Jun 22, 2019 •

edited

Loading

Concurrent small allocation defeats large allocation #8843

Concurrent small allocation defeats large allocation #8843

Conversation

pcd1193182 commented Jun 1, 2019

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

ahrens commented Jun 2, 2019

behlendorf commented Jun 5, 2019

codecov bot commented Jun 22, 2019 • edited Loading

Codecov Report

codecov bot commented Jun 22, 2019 •

edited

Loading