Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(device_remove): hot remove of spdk bdev should drop descriptor #1553

Closed
wants to merge 1 commit into from

Conversation

dsharma-dc
Copy link
Contributor

@dsharma-dc dsharma-dc commented Nov 29, 2023

If a bdev replica(non-nvmf) is destroyed, the device removal path doesn't close the device as in case of nvmf child. As a result, the child state isn't set as Destroyed, which results in bdev descriptor being kept opened and hence spdk internally stucks into a ebusy state for that bdev.

@dsharma-dc
Copy link
Contributor Author

bors try

bors-openebs-mayastor bot pushed a commit that referenced this pull request Nov 29, 2023
@bors-openebs-mayastor
Copy link

try

Build succeeded:

@dsharma-dc dsharma-dc marked this pull request as ready for review November 29, 2023 17:04
@auto-assign auto-assign bot requested a review from chriswldenyer November 29, 2023 17:04
@dsharma-dc
Copy link
Contributor Author

bors try

bors-openebs-mayastor bot pushed a commit that referenced this pull request Dec 8, 2023
@dsharma-dc
Copy link
Contributor Author

bors try

@bors-openebs-mayastor
Copy link

try

Already running a review

@dsharma-dc dsharma-dc changed the title fix(device_remove): close bdev during device removal for spdk bdev fix(device_remove): hot remove of spdk bdev should drop descriptor Dec 8, 2023
// will send us here and keeping the device/descriptor isn't of
// any use.
debug!("{self:?}: dropping block device");
self.device = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm if there's a hot removal this way, does this mean we don't go through the pause dance?
If so then we need to cater for that as well..

Copy link
Contributor Author

@dsharma-dc dsharma-dc Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't do that pause here because child retire isn't done. There are more nuances involved here than that make me think we are better maintain status quo with this code. I'm inclined to park this change for now.

  1. A way to simulate hot remove is doing bdev destroy on pool disk rather than pool destroy. In this case, both kind of child(nvmf and spdk bdev) behave similarly and are unplugged fine, without going via retire path and pause.
  2. The pool destroy or replica destroy are to be rather considered as fault. nvmf child receives AdminCommandCompletionFailure for these, deems the child faulted, retires it, and then triggers DeviceRemoved event. For spdk bdev, we directly get DeviceRemoved in these cases without child having gone through retire.
  3. Practically, 1 is more likely than 2 in actual system.

@bors-openebs-mayastor
Copy link

try

Build failed:

@dsharma-dc
Copy link
Contributor Author

not pursuing at the moment.

@dsharma-dc dsharma-dc closed this Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants