Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connection bugs resulting from short timeout on NodeManager.pingNode #473

Closed
tegefaulkes opened this issue Oct 5, 2022 · 3 comments
Closed
Assignees
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@tegefaulkes
Copy link
Contributor

Specification

During PR #445 encountered some connection bugs. The bugs started after updating the NodeManager.pingNode method to use timedCancellable. The running theory is that cancelling the ping when it is in the process of establishing the connection leads to an unhanded condition within the Proxy code resulting in an error.

At the time to complete the PR we disabled any failing tests and changed the timeout for pingNode to 20000ms from 2000ms. The longer delay seemed to fix the symptom. To re-create the bug I think you only need to set the pingNode timeout to something short such as 2000ms and that should cause the error again.

This bug needs to be fixed. If possible a test created to test the specific problem though given the nature of race conditions it might not be possible to create a reliable test.

Additional context

Tasks

  1. Recreate the bug and narrow down the cause.
  2. If possible create a test that can reliable trigger the bug.
  3. Fix the bug.
@tegefaulkes tegefaulkes added the development Standard development label Oct 5, 2022
@tegefaulkes tegefaulkes self-assigned this Oct 5, 2022
@tegefaulkes
Copy link
Contributor Author

I've seen this error again. It's coming out of the race condition during ConnectionForward.start. The errorP is rejecting with this error. The problem is, in that case it's caught and re-thrown as ErrorConnectionStart. The error is also ending up at the unhandledRejectionHandler. So it's in a rejected promise that is being handled but AlSO in a promise I can't find easily that is rejecting and not being handled.

This is a tricky one to fix given how unhelpful unhandled rejection errors are. We don't have a stack to work from so we're groping in the dark here.

We can re-create this problem by causing the proxy connections to timeout very quickly. I think it's the reverse connection rejecting during establishment that is causing the error on the forward connection.

@tegefaulkes
Copy link
Contributor Author

Relevant comment #489 (comment)

@tegefaulkes
Copy link
Contributor Author

It was caused by doing void errorP.then(); when I added debug logging outputs. This was orphaning the promise when it rejected resulting in the un-handled promise rejection. In the future we should never use promise.then() for logging when a promise resolved. If you do then at the very least add a catch handler.

@CMCDragonkai CMCDragonkai added bug Something isn't working and removed development Standard development labels Oct 26, 2022
@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

2 participants