-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev_random_leaf() can loop forever #6631
Comments
@ofaaland can you please take a look at this. It looks to me like we want to update |
@behlendorf it does that now, so something else is amiss. I'll take a look. |
@ofaaland I believe it will spin if you keep hitting the |
@behlendorf yes, but if that happens I believe that means Perhaps I'm wrong. That is, that a vdev can be writeable when its children are not writeable, presumably because an error has not yet percolated up the vdev tree. Do you know? Otherwise, it seems like I'll have to look at how the vdev state changes occur to be able to tell whether this is a ztest flaw, state change flaw, or |
I don't believe |
OK, that certainly would explain this. |
Rename it as mmp_random_leaf() since it is defined in mmp.c. The earlier implementation could end up spinning forever if a pool had a vdev marked writeable, none of whose children were writeable. It also did not guarantee that if a writeable leaf vdev existed, it would be found. Reimplement to recursively walk the device tree to select the leaf. It searches the entire tree, so that a return value of (NULL) indicates there were no usable leaves in the pool; all were either not writeable or had pending mmp writes. It still chooses the starting child randomly at each level of the tree, so if the pool's devices are healthy, the mmp writes go to random leaves with an even distribution. This was verified by testing using zfs_multihost_history enabled. Fixes openzfs#6631 Signed-off-by: Olaf Faaland <[email protected]>
Rename it as mmp_random_leaf() since it is defined in mmp.c. The earlier implementation could end up spinning forever if a pool had a vdev marked writeable, none of whose children were writeable. It also did not guarantee that if a writeable leaf vdev existed, it would be found. Reimplement to recursively walk the device tree to select the leaf. It searches the entire tree, so that a return value of (NULL) indicates there were no usable leaves in the pool; all were either not writeable or had pending mmp writes. It still chooses the starting child randomly at each level of the tree, so if the pool's devices are healthy, the mmp writes go to random leaves with an even distribution. This was verified by testing using zfs_multihost_history enabled. Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #6631 Closes #6665
Rename it as mmp_random_leaf() since it is defined in mmp.c. The earlier implementation could end up spinning forever if a pool had a vdev marked writeable, none of whose children were writeable. It also did not guarantee that if a writeable leaf vdev existed, it would be found. Reimplement to recursively walk the device tree to select the leaf. It searches the entire tree, so that a return value of (NULL) indicates there were no usable leaves in the pool; all were either not writeable or had pending mmp writes. It still chooses the starting child randomly at each level of the tree, so if the pool's devices are healthy, the mmp writes go to random leaves with an even distribution. This was verified by testing using zfs_multihost_history enabled. Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes openzfs#6631 Closes openzfs#6665
Describe the problem you're observing
While adding encrypted dataset support to ztest I cam across an issue with the mmp code. ztest began to eat 100% CPU without making any forward progress. When I examined the process with gdb, I found that
vdev_random_leaf()
was looping infinitely. It seems that ztest had gotten the pool into a state where it had one top-level mirror vdev with only one child which was not writable. This causes the following code to loop infinitely:Describe how to reproduce the problem
This came up twice while doing lots of zloop runs over the course of a week. I suspect it should be fairly easy to reproduce by creating a pool with the layout I described above.
The text was updated successfully, but these errors were encountered: