-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(device_remove): hot remove of spdk bdev should drop descriptor #1553
Conversation
bors try |
tryBuild succeeded: |
670fd33
to
1aa5adf
Compare
Signed-off-by: Diwakar Sharma <[email protected]>
1aa5adf
to
0bd4b3b
Compare
bors try |
bors try |
tryAlready running a review |
// will send us here and keeping the device/descriptor isn't of | ||
// any use. | ||
debug!("{self:?}: dropping block device"); | ||
self.device = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm if there's a hot removal this way, does this mean we don't go through the pause dance?
If so then we need to cater for that as well..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we don't do that pause here because child retire isn't done. There are more nuances involved here than that make me think we are better maintain status quo with this code. I'm inclined to park this change for now.
- A way to simulate hot remove is doing
bdev destroy
on pool disk rather thanpool destroy
. In this case, both kind of child(nvmf and spdk bdev) behave similarly and are unplugged fine, without going via retire path and pause. - The
pool destroy
orreplica destroy
are to be rather considered as fault. nvmf child receivesAdminCommandCompletionFailure
for these, deems the child faulted, retires it, and then triggersDeviceRemoved
event. For spdk bdev, we directly getDeviceRemoved
in these cases without child having gone through retire. - Practically, 1 is more likely than 2 in actual system.
tryBuild failed: |
not pursuing at the moment. |
If a bdev replica(non-nvmf) is destroyed, the device removal path doesn't close the device as in case of nvmf child. As a result, the child state isn't set as Destroyed, which results in bdev descriptor being kept opened and hence spdk internally stucks into a ebusy state for that bdev.