Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ BUGFIX ] Fabrics connect timeouts #1711

Merged
merged 5 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix(nexus-child/unplug): remove usage of block_on
Initially this block_on was added because the remove callback was running in blocking
fashion, but this has since changed and unplug is actually called from async context.
As such, we don't need the block_on and simply call the async code directly.
Also, simplify complete notification, as we can simply close the sender.

Signed-off-by: Tiago Castro <[email protected]>
  • Loading branch information
tiagolobocastro committed Aug 6, 2024
commit ad5c31a6fdbf9d069f5d322dc0818afeff821ce8
2 changes: 1 addition & 1 deletion io-engine/src/bdev/nexus/nexus_bdev_children.rs
Original file line number Diff line number Diff line change
Expand Up @@ -900,7 +900,7 @@ impl<'n> Nexus<'n> {
nexus_name,
child_device, "Unplugging nexus child device",
);
child.unplug();
child.unplug().await;
}
None => {
warn!(
Expand Down
34 changes: 12 additions & 22 deletions io-engine/src/bdev/nexus/nexus_child.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ use crate::{
BlockDeviceHandle,
CoreError,
DeviceEventSink,
Reactor,
Reactors,
VerboseError,
},
eventing::replica_events::state_change_event_meta,
Expand Down Expand Up @@ -1109,7 +1107,7 @@ impl<'c> NexusChild<'c> {
/// underlying device is removed.
///
/// Note: The descriptor *must* be dropped for the unplug to complete.
pub(crate) fn unplug(&mut self) {
pub(crate) async fn unplug(&mut self) {
info!("{self:?}: unplugging child...");

let state = self.state();
Expand Down Expand Up @@ -1139,12 +1137,10 @@ impl<'c> NexusChild<'c> {
// device-related events directly.
if state != ChildState::Faulted(FaultReason::IoError) {
let nexus_name = self.parent.clone();
Reactor::block_on(async move {
match nexus_lookup_mut(&nexus_name) {
Some(n) => n.reconfigure(DrEvent::ChildUnplug).await,
None => error!("Nexus '{nexus_name}' not found"),
}
});
match nexus_lookup_mut(&nexus_name) {
Some(n) => n.reconfigure(DrEvent::ChildUnplug).await,
None => error!("Nexus '{nexus_name}' not found"),
}
}

if is_destroying {
Expand All @@ -1153,22 +1149,16 @@ impl<'c> NexusChild<'c> {
self.device_descriptor.take();
}

self.unplug_complete();
info!("{self:?}: child successfully unplugged");
self.unplug_complete().await;
}

/// Signal that the child unplug is complete.
fn unplug_complete(&self) {
let sender = self.remove_channel.0.clone();
let name = self.name.clone();
Reactors::current().send_future(async move {
if let Err(e) = sender.send(()).await {
error!(
"Failed to send unplug complete for child '{}': {}",
name, e
);
}
});
async fn unplug_complete(&self) {
if let Err(error) = self.remove_channel.0.send(()).await {
info!("{self:?}: failed to send unplug complete: {error}");
} else {
info!("{self:?}: child successfully unplugged");
}
}

/// create a new nexus child
Expand Down