Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: graceful shutdown #1994

Merged
merged 9 commits into from
Nov 18, 2020
Merged

fix: graceful shutdown #1994

merged 9 commits into from
Nov 18, 2020

Conversation

LePremierHomme
Copy link
Contributor

@LePremierHomme LePremierHomme commented Nov 16, 2020

An attempt to solve #1668.

Various fixes were made to prevent delays in shutdown:

  • Revoke connection retries of closed peers.
  • When using Tor, first create a plain socket connection with the local service, before passing it to the SOCKS client. This enables us to keep track of the raw socket while SOCKS connection to destination is pending, and to close it on shutdown.
  • Keep track of the reachability verification dummy peer, so it can be closed on shutdown.
  • Keep track of ConnextClient's pending HTTP requests, and abort them on shutdown (besides the critical ones). A jest test was added, but it's using an HTTP client mock (which works according to nodejs documentation). This is not ideal, but I thought that using a real HTTP client/server for this test would be an overkill.
    This behaviour still need to be manually tested.

Pending swaps are still expected to (intentionally) cause delay in shutdown.
There might be other issues which cause a delay which I haven't encountered.
So any help in testing would be appriciated. Please paste your logs here (even if it works).

To launch via xud-docker:

$ bash ~/xud.sh -b fix_shutdown

@LePremierHomme LePremierHomme requested review from a user, sangaman, kilrau and raladev November 16, 2020 19:09
@sangaman sangaman added the code quality Improving code structure, organization, and clarity label Nov 17, 2020
Copy link
Collaborator

@sangaman sangaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, just see one suggestion about how we track/reference pending connext requests.

lib/connextclient/ConnextClient.ts Outdated Show resolved Hide resolved
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to getting this in... 👍

raladev
raladev previously approved these changes Nov 17, 2020
@kilrau kilrau requested review from a user and sangaman November 17, 2020 15:17
@kilrau
Copy link
Contributor

kilrau commented Nov 17, 2020

Fix is in, could you give this another look? @sangaman

Copy link
Collaborator

@sangaman sangaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now. This is a nice approach and good sleuth work to identify these issues, I'm expecting this to make a real difference.

@rsercano
Copy link
Collaborator

Another case (don't know if we want to stay it like this) is that somebody calling streamorders or new stream grpc call subscribe alerts (#1984) then unless stream is closed xud is not being closed.

Ignore if this's intentional.

@kilrau
Copy link
Contributor

kilrau commented Nov 18, 2020

I think we should to close streams as part of the shutdown procedure. If that's not the case, want to take care of it as a separate PR? @rsercano

@rsercano
Copy link
Collaborator

I think we should to close streams as part of the shutdown procedure. If that's not the case, want to take care of it as a separate PR? @rsercano

Sure as long as @LePremierHomme doesn't want to address it, I can check too.

@kilrau
Copy link
Contributor

kilrau commented Nov 18, 2020

Definitely not as part of this PR, so go ahead 👍

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works very nicely for me. Thanks for this!

@rsercano
Copy link
Collaborator

Definitely not as part of this PR, so go ahead 👍

After digging into this, I realized this has already been addressed for streamorders and my bad not to handle on subscribealerts by adding stream to current list, Im now adding it and should be handled on close.

@raladev raladev merged commit cde3203 into master Nov 18, 2020
sangaman added a commit that referenced this pull request Dec 3, 2020
This fixes a regression introduced in #1994 that prevents enabling the
stall timer for peers that checks whether peers are not responding to
our packets in a reasonable timeframe and disconnecting those that are
not. Without these checks, peers that have gone offline may remain in
the list of connected peers, resulting in unexpected behavior and also
preventing those peers from establishing new connections with us once
they come back online.

Closes #2010.
raladev pushed a commit that referenced this pull request Dec 3, 2020
This fixes a regression introduced in #1994 that prevents enabling the
stall timer for peers that checks whether peers are not responding to
our packets in a reasonable timeframe and disconnecting those that are
not. Without these checks, peers that have gone offline may remain in
the list of connected peers, resulting in unexpected behavior and also
preventing those peers from establishing new connections with us once
they come back online.

Closes #2010.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality Improving code structure, organization, and clarity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants