Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking nvme disconnect fixes #776

Merged
merged 3 commits into from
Mar 8, 2024
Merged

Conversation

tiagolobocastro
Copy link
Contributor

fix(agents/ha/cluster): increase path replacement timeout

In a system test run we've seen the disconnect of an old path take 50s
to complete.
We're not sure why this is the case as we cannot reproduce it yet but
for now we can increase the timeout.
This is probably fine as if we can't talk to the node where the app is
then we can't do much more anyway.

We should also consider whether we need to wait for the disconnect at
all. If we can find out if the disconnect has been started, then we
can probably just complete that async and succeed the replace!

Signed-off-by: Tiago Castro <[email protected]>

fix(agents/ha/node): disconnect controller async

Dependent nvmeadm crate is blocking and certain operations such as
disconnecting the controller can take quite some time.
Here we are just doing a WA by running the disconnect in a tokio
blocking spawn.
todo: Ideally we should either make nvmeadm async or somehow provide
async as alternative, which won't be great as the code will
likely get duplicated...

Signed-off-by: Tiago Castro <[email protected]>

Dependent nvmeadm crate is blocking and certain operations such as
disconnecting the controller can take quite some time.
Here we are just doing a WA by running the disconnect in a tokio
blocking spawn.
todo: Ideally we should either make nvmeadm async or somehow provide
async as alternative, which won't be great as the code will
likely get duplicated...

Signed-off-by: Tiago Castro <[email protected]>
In a system test run we've seen the disconnect of an old path take 50s
to complete.
We're not sure why this is the case as we cannot reproduce it yet but
for now we can increase the timeout.
This is probably fine as if we can't talk to the node where the app is
then we can't do much more anyway.

We should also consider wether we need to wait for the disconnect at
all. If we can find out if the disconnect has been started, then we
can probably just complete that async and succeed the replace!

Signed-off-by: Tiago Castro <[email protected]>
@tiagolobocastro tiagolobocastro changed the title Blocking nme disconnect fixes Blocking nvme disconnect fixes Mar 7, 2024
@tiagolobocastro
Copy link
Contributor Author

bors merge

@bors-openebs-mayastor
Copy link

Build succeeded:

@bors-openebs-mayastor bors-openebs-mayastor bot merged commit 3eaf3ed into develop Mar 8, 2024
4 checks passed
@bors-openebs-mayastor bors-openebs-mayastor bot deleted the blocking-disc branch March 8, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants