-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't activate metaslabs with weight 0 #8968
Conversation
Signed-off-by: Paul Dagnelie <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #8968 +/- ##
==========================================
- Coverage 78.74% 78.67% -0.07%
==========================================
Files 388 388
Lines 119972 119975 +3
==========================================
- Hits 94466 94391 -75
- Misses 25506 25584 +78
Continue to review full report at Codecov.
|
Hey @pcd1193182 , we've actually seen this before internally within Delphix (see DLPX-61947 and relevant fix). It may be preferable upstreaming the existing fix from DelphixOS for the following reasons: If you agree I can open a PR to upstream that change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above
@sdimitro That change wouldn't fix this bug. The second call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me, one small nit.
In metaslab_group_alloc_normal()
there exists this comment:
int activation_error =
metaslab_activate(msp, allocator, activation_weight);
metaslab_active_mask_verify(msp);
/*
* If the metaslab was activated by another thread for
* another allocator or activation_weight (EBUSY), or it
* failed because another metaslab was assigned as primary
* for this allocator (EEXIST) we continue using this
* metaslab for our allocation, rather than going on to a
* worse metaslab (we waited for that metaslab to be loaded
* after all).
*
* If the activation failed due to an I/O error we skip to
* the next metaslab.
*/
boolean_t activated;
if (activation_error == 0) {
activated = B_TRUE;
} else if (activation_error == EBUSY ||
activation_error == EEXIST) {
activated = B_FALSE;
} else {
mutex_exit(&msp->ms_lock);
continue;
}
We may want to add the ENOSPC
case together with the EIO
one, e.g.
... <cropped> ...
* If the activation failed due to an ENOSPC or I/O error we skip
* to the next metaslab.
*/
@sdimitro Agreed and done. |
@pcd1193182 when you get a chance can you push the updated version. |
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid activating a metaslab with no space available. For sanity checking, we also assert that there is no free space in the range tree in that case. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#8968
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid activating a metaslab with no space available. For sanity checking, we also assert that there is no free space in the range tree in that case. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#8968
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid activating a metaslab with no space available. For sanity checking, we also assert that there is no free space in the range tree in that case. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#8968
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid activating a metaslab with no space available. For sanity checking, we also assert that there is no free space in the range tree in that case. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#8968
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid activating a metaslab with no space available. For sanity checking, we also assert that there is no free space in the range tree in that case. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes #8968
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid activating a metaslab with no space available. For sanity checking, we also assert that there is no free space in the range tree in that case. Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#8968
Signed-off-by: Paul Dagnelie [email protected]
Reviewed By: Igor Kozhukhov [email protected]
Motivation and Context
During testing on DilOS, Igor noticed a consistent panic in the checkpoint tests where a metaslab had an activation weight 0. He shared the core file with me, and we discovered that the metaslab's weight was also 0. This means we activated a metaslab with no space available; there appear to be no safeguards against this behavior. Once that happens, any attempt to passivate that metaslab will result in an assertion failure.
Description
We return ENOSPC in metaslab_activate if the metaslab has weight 0, to avoid this issue. For sanity checking, we also assert that there is no free space in the range tree in that case.
How Has This Been Tested?
Tested on DilOS on the system that was consistently failing; test runs since then have passed.
Types of changes
Checklist:
Signed-off-by
.