-
Notifications
You must be signed in to change notification settings - Fork 37
Attempt to dial all addresses in parallel and use the first success #193
Conversation
ipfs/js-ipfs#812 We were not dialling all addresses in sequence, currently only the first address was ever dialled leading to regular failures. This dials all in parallel and uses the first success (or the first timeout).
@@ -53,11 +54,18 @@ module.exports = function (swarm) { | |||
// specific for the transport we are using | |||
const proxyConn = new Connection() | |||
|
|||
next(multiaddrs.shift()) | |||
each(multiaddrs, next) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could use async.race
instead, then you don't have to do the manual success tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't async.race
fail on the first failed attempt? I think we'd want all the addresses to be dialed and be able to inspect those that worked, and not stop on first fail - a la bluebird inspect way, async.parallel
seems to do just that? Also, not sure if dialing in parallel is the best course of action either, unless we want to open connections on all available transports? At the very least, it should try ALL the transports addresses until one works, the fact that it didn't is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently i think it just takes the first return, which is generally either a success or a timeout, which works well for me. I think dialling in parallel is needed because websocket timeout is around ten seconds, so we really want to try every address at once and then move forward with whatever sticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... that makes sense.
Does it makes sense to keep all the available connections around? Would it makes sense to switch between connection depending on latency (probably overkill)?
In fairness tho, the first one to come back is probably the fastest one already 🍭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and i don't want you to slow down on the relay as it will be game changing for my project!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackkleeman - yep, that was a great catch overall, I've patched my local fork with the simpler sequential next(shift)
so far. This might have been the culprit of a bunch of weird issues I was running into at some point with the the WS vs TCP order of dialing libp2p/js-libp2p-circuit#1 (comment) 🥇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diasdavid @dignifiedquire +1 to merging this as is for now.
I've created an issue to track the enhancement proposed by @dignifiedquire
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: #193 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to implement the new dialing today as I need to go into that code anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for detecting this and PR'ing a fix @jackkleeman. Just a small comment that needs to be addressed, otherwise LGTM.
const conn = t.dial(multiaddr, () => { | ||
if (success) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is leaving open sockets behind. We need to check: If I have successfully opened a socket before, then I need to close this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @dignifiedquire is replacing this PR today should I ignore this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore this yes, I will ping you once the new implementation is up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! As it turns out, my each solution is failing in a few cases already (when one address fails immeadiately instead of slowly timing out)
Replaces #193 Fixes #194
Outdone by #195 |
ipfs/js-ipfs#812
We were not dialling all addresses in sequence, currently only the first address was ever dialled leading to regular failures. This dials all in parallel and uses the first success (or the first timeout).