-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool remove fails with "Out of space" #11356
Comments
This is caused by the relatively small size of the disks, in this case 64M. In practice for this test to pass the disks must be at least 300M in size. I agree the error here isn't very helpful and we should look in to if this can be improved for very small vdevs. |
Amazing, works! Thanks. Indeed, different message would be helpful. |
I'm having the same problem, but with much larger vdevs. I accidentally added an 8TB mirror to the wrong pool, and now I can't remove it. This is a rootfs pool (but not a boot pool)... could that be related?
|
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements and often prevent removal when it clearly should be possible. This change reworks the check to use the per-vdev vs_stats. This is sufficiently accurate and is convenient because it allows a direct comparison to the allocated space. If using dsl_dir_space_available() then the available space of the device being removed is also included in the return value and must somehow be accounted for. Additionally, we reduce the slop space requirement to be inline with with the capacity of the pool after the device has been removed. This way if a large device is accidentally added to a small pool the vastly increased slop requirement of the larger pool won't prevent removal. For example, it was previously possible that even newly added empty vdevs couldn't be removed due to the increased slop requirement. This was particularly unfortunate since one of the main motivations for this feature was to allow such mistakes to be corrected. Lastly it's worth mentioning that by allowing the removal to start with close to the minimum required free space it is more likely that an active pool close to capacity may fail the removal process. This failure case has always been possible since we can't know what new data will be written during the removal process. It is correctly handled and there's no danager to the pool. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements and often prevent removal when it clearly should be possible. This change reworks the check to use the per-vdev vs_stats. This is sufficiently accurate and is convenient because it allows a direct comparison to the allocated space. If using dsl_dir_space_available() then the available space of the device being removed is also included in the return value and must somehow be accounted for. Additionally, we reduce the slop space requirement to be inline with with the capacity of the pool after the device has been removed. This way if a large device is accidentally added to a small pool the vastly increased slop requirement of the larger pool won't prevent removal. For example, it was previously possible that even newly added empty vdevs couldn't be removed due to the increased slop requirement. This was particularly unfortunate since one of the main motivations for this feature was to allow such mistakes to be corrected. Lastly it's worth mentioning that by allowing the removal to start with close to the minimum required free space it is more likely that an active pool close to capacity may fail the removal process. This failure case has always been possible since we can't know what new data will be written during the removal process. It is correctly handled and there's no danager to the pool. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
@WhittlesJr yes it sounds like you've hit this same issue. I've opened PR #11409 with a proposed fix for this. If you're comfortable rolling your own build of ZFS you could apply it, remove the vdev using the patched version, then switch back to your current version. |
@behlendorf, This pool is a bit too sensitive to try experimental builds, I'm afraid. I've already rebuilt and restored it, so I'm good now. But I'm grateful that you've proposed a fix to the issue. If I can help test it in other ways, let me know. |
@behlendorf another one updated test
|
@ikozhukhov it looks like this test failure is also occurring due to the minimum "slop" requirement. Even with the patch there must be a minimum of "vdev-allocated-bytes + 2 * slop-bytes" of available space on the remaining vdevs. Since by default there's a minimum "slop" size of |
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements and often prevent removal when it clearly should be possible. This change reworks the check to use the per-vdev vs_stats. This is sufficiently accurate and is convenient because it allows a direct comparison to the allocated space. If using dsl_dir_space_available() then the available space of the device being removed is also included in the return value and must somehow be accounted for. Additionally, we reduce the slop space requirement to be inline with with the capacity of the pool after the device has been removed. This way if a large device is accidentally added to a small pool the vastly increased slop requirement of the larger pool won't prevent removal. For example, it was previously possible that even newly added empty vdevs couldn't be removed due to the increased slop requirement. This was particularly unfortunate since one of the main motivations for this feature was to allow such mistakes to be corrected. Lastly it's worth mentioning that by allowing the removal to start with close to the minimum required free space it is more likely that an active pool close to capacity may fail the removal process. This failure case has always been possible since we can't know what new data will be written during the removal process. It is correctly handled and there's no danger to the pool. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements and often prevent removal when it clearly should be possible. This change reworks the check slightly. First, we reduce the slop space requirement to be inline with with the capacity of the pool after the device has been removed. This way if a large device is accidentally added to a small pool the vastly increased slop requirement of the larger pool won't prevent removal. For example, it was previously possible that even newly added empty vdevs couldn't be removed due to the increased slop requirement. This was particularly unfortunate since one of the main motivations for this feature was to allow such mistakes to be corrected. Second, we handle the case of very small pools where the minimum slop size of 128M represents a large fraction (>25%) of the total pool capacity. In this case, it's reasonable to reduce the extra slop requirement. Lastly it's worth mentioning that by allowing the removal to start with close to the minimum required free space it is more likely that an active pool close to capacity may fail the removal process. This failure case has always been possible since we can't know what new data will be written during the removal process. It is correctly handled and there's no danger to the pool. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements and often prevent removal when it clearly should be possible. This change reworks the check slightly. First, we reduce the slop space requirement to be inline with with the capacity of the pool after the device has been removed. This way if a large device is accidentally added to a small pool the vastly increased slop requirement of the larger pool won't prevent removal. For example, it was previously possible that even newly added empty vdevs couldn't be removed due to the increased slop requirement. This was particularly unfortunate since one of the main motivations for this feature was to allow such mistakes to be corrected. Second, we handle the case of very small pools where the minimum slop size of 128M represents a large fraction (>25%) of the total pool capacity. In this case, it's reasonable to reduce the extra slop requirement. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements and often prevent removal when it clearly should be possible. This change reworks the check slightly. First, we reduce the slop space requirement to be inline with with the capacity of the pool after the device has been removed. This way if a large device is accidentally added to a small pool the vastly increased slop requirement of the larger pool won't prevent removal. For example, it was previously possible that even newly added empty vdevs couldn't be removed due to the increased slop requirement. This was particularly unfortunate since one of the main motivations for this feature was to allow such mistakes to be corrected. Second, we handle the case of very small pools where the minimum slop size of 128M represents a large fraction (>25%) of the total pool capacity. In this case, it's reasonable to reduce the extra slop requirement. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements for small pools preventing removal when it clearly should be possible. This change reworks the check to drop the additional slop space from the calculation. This solves two problems: 1) If a large device is accidentally added to a small pool the slop requirement is dramatically increased. This in turn can prevent removal of that larger device even if it has never been used. This was particularly unfortunate since a main motivation for this feature was to allow such mistakes to be corrected. 2) For very small pools with 256M vdevs the minimum slop size of 128M represents a huge fraction of the total pool capacity. Requiring twice this amount of space to be available effectively prevents device removal from ever working with these tiny pools. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements for small pools preventing removal when it clearly should be possible. This change reworks the check to drop the additional slop space from the calculation. This solves two problems: 1) If a large device is accidentally added to a small pool the slop requirement is dramatically increased. This in turn can prevent removal of that larger device even if it has never been used. This was particularly unfortunate since a main motivation for this feature was to allow such mistakes to be corrected. 2) For very small pools with 256M vdevs the minimum slop size of 128M represents a huge fraction of the total pool capacity. Requiring twice this amount of space to be available effectively prevents device removal from ever working with these tiny pools. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
The available space check in spa_vdev_remove_top_check() is intended to verify there is enough available space on the other devices before starting the removal process. This is obviously a good idea but the current check can significantly overestimate the space requirements for small pools preventing removal when it clearly should be possible. This change reworks the check to drop the additional slop space from the calculation. This solves two problems: 1) If a large device is accidentally added to a small pool the slop requirement is dramatically increased. This in turn can prevent removal of that larger device even if it has never been used. This was particularly unfortunate since a main motivation for this feature was to allow such mistakes to be corrected. 2) For very small pools with 256M vdevs the minimum slop size of 128M represents a huge fraction of the total pool capacity. Requiring twice this amount of space to be available effectively prevents device removal from ever working with these tiny pools. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#11356
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
According to #6900 it should be possible to remove a top-level vdev (implemented in v 0.8.5)
However,
zpool remove
results in "out of space".Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: