Skip to content
This repository has been archived by the owner on Aug 23, 2019. It is now read-only.

Attempt to dial all addresses in parallel and use the first success #193

Closed
wants to merge 1 commit into from

Conversation

jackkleeman
Copy link

ipfs/js-ipfs#812
We were not dialling all addresses in sequence, currently only the first address was ever dialled leading to regular failures. This dials all in parallel and uses the first success (or the first timeout).

ipfs/js-ipfs#812
We were not dialling all addresses in sequence, currently only the first address was ever dialled leading to regular failures. This dials all in parallel and uses the first success (or the first timeout).
@@ -53,11 +54,18 @@ module.exports = function (swarm) {
// specific for the transport we are using
const proxyConn = new Connection()

next(multiaddrs.shift())
each(multiaddrs, next)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could use async.race instead, then you don't have to do the manual success tracking

Copy link
Member

@dryajov dryajov Mar 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't async.race fail on the first failed attempt? I think we'd want all the addresses to be dialed and be able to inspect those that worked, and not stop on first fail - a la bluebird inspect way, async.parallel seems to do just that? Also, not sure if dialing in parallel is the best course of action either, unless we want to open connections on all available transports? At the very least, it should try ALL the transports addresses until one works, the fact that it didn't is a bug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently i think it just takes the first return, which is generally either a success or a timeout, which works well for me. I think dialling in parallel is needed because websocket timeout is around ten seconds, so we really want to try every address at once and then move forward with whatever sticks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... that makes sense.

Does it makes sense to keep all the available connections around? Would it makes sense to switch between connection depending on latency (probably overkill)?

In fairness tho, the first one to come back is probably the fastest one already 🍭

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking exactly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and i don't want you to slow down on the relay as it will be game changing for my project!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackkleeman - yep, that was a great catch overall, I've patched my local fork with the simpler sequential next(shift) so far. This might have been the culprit of a bunch of weird issues I was running into at some point with the the WS vs TCP order of dialing libp2p/js-libp2p-circuit#1 (comment) 🥇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid @dignifiedquire +1 to merging this as is for now.

I've created an issue to track the enhancement proposed by @dignifiedquire

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to implement the new dialing today as I need to go into that code anyway

Copy link
Member

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for detecting this and PR'ing a fix @jackkleeman. Just a small comment that needs to be addressed, otherwise LGTM.

const conn = t.dial(multiaddr, () => {
if (success) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leaving open sockets behind. We need to check: If I have successfully opened a socket before, then I need to close this one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @dignifiedquire is replacing this PR today should I ignore this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore this yes, I will ping you once the new implementation is up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! As it turns out, my each solution is failing in a few cases already (when one address fails immeadiately instead of slowly timing out)

dignifiedquire added a commit that referenced this pull request Mar 27, 2017
@jackkleeman
Copy link
Author

Outdone by #195

@jackkleeman jackkleeman deleted the patch-1 branch March 27, 2017 12:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants