Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS starts too many new connections on add #4102

Closed
magik6k opened this issue Jul 28, 2017 · 6 comments · Fixed by libp2p/go-libp2p-swarm#26 or #4111
Closed

IPFS starts too many new connections on add #4102

magik6k opened this issue Jul 28, 2017 · 6 comments · Fixed by libp2p/go-libp2p-swarm#26 or #4111
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress
Milestone

Comments

@magik6k
Copy link
Member

magik6k commented Jul 28, 2017

Version information:

current master at 181dd00

Type:

Bug

Severity:

High

Description:

When adding a large file the daemon starts creating many new connections eventually failing with too many open files error. This can be easily reproduced by running the daemon and then dd bs=1M if=/dev/urandom count=1000 | ipfs add --pin=false

Script/command for daemon monitoring:

export DPID=$(pidof ipfs); watch -n0 'printf "sockets: %s\nleveldb: %s\nflatfs: %s\n" $(ls /proc/${DPID}/fd/ -l | grep "socket:" | wc -l) $(ls /proc/${DPID}/fd/ -l | grep "\\/datastore\\/" | wc -l) $(ls /proc/${DPID}/fd/ -l | grep "\\/blocks\\/" | wc -l); netstat -anpt 2>/dev/null | grep "$DPID/ipfs" | sort -k6 | column -N "a,b,c,d,e,f,g" -J | jq ".table[].f" --raw-output | uniq -c'

When the add is running the daemon sometimes starts creating way more new connections(SYN_SENT) than it should, up to 1500 when it runs out of FD limit.

This is directly causing #3792 and is likely one of the reasons why many lower-end routers can't handle ipfs.

@magik6k magik6k added the kind/bug A bug in existing code (including security flaws) label Jul 28, 2017
@Kubuxu
Copy link
Member

Kubuxu commented Jul 28, 2017

This mean that connection listener is not working in go-libp2p-swarm.

I suspect something might be wrong with maftm.TCP or we are passing wrong multiaddr there, There is similar weird issue with fallback dialer being selected even though there is perfectly ok dialer to use.

@magik6k
Copy link
Member Author

magik6k commented Jul 28, 2017

After digging a bit more, turns out that:

  • dialLimiter.finishDial gets called with dialJobs that are on waitingOnFd list
  • dialLimiter.finishDial can get called 10x for one job
  • In both cases this lowers dl.fdConsuming, after a while(<1min since daemon start) daemon 'consumes' -10000 connections
  • (I wonder if it may overflow after running daemon for a long time and cause it to not be able to connect to any new peers)
  • Other less significant issue is that addrutil.IsFDCostlyTransport(which uses mafmt.TCP) doesn't match /ip4/1.2.3.4/tcp/123/ws(or /utp) addrs. This ignores the FD limit too but it adds way less to used FDs

@magik6k
Copy link
Member Author

magik6k commented Jul 28, 2017

Fix is in libp2p/go-libp2p-swarm#26, bubbling it up is probably blocked by #4094.

@magik6k magik6k added status/blocked Unable to be worked further until needs are met status/in-progress In progress labels Jul 28, 2017
@magik6k magik6k added this to the Ipfs 0.4.11 milestone Jul 28, 2017
@Kubuxu
Copy link
Member

Kubuxu commented Jul 28, 2017

It should be bubble up in 0.4.10, it is a big bug that hits many users and makes go-ipfs 1. hard to use 2. seem more resource expensive than it can be.

@Kubuxu
Copy link
Member

Kubuxu commented Jul 28, 2017

Oh, I forgot 4.10 is out.

@magik6k
Copy link
Member Author

magik6k commented Jul 31, 2017

Reopening as libp2p/go-libp2p-swarm#26 isn't yet propagated and this got auto-closed.

@magik6k magik6k reopened this Jul 31, 2017
@Stebalien Stebalien removed the status/blocked Unable to be worked further until needs are met label Jul 31, 2017
Stebalien added a commit that referenced this issue Jul 31, 2017
fixes #4102 (fixed in go-libp2p-swarm)

License: MIT
Signed-off-by: Steven Allen <[email protected]>
Stebalien added a commit that referenced this issue Jul 31, 2017
fixes #4102 (fixed in go-libp2p-swarm)

License: MIT
Signed-off-by: Steven Allen <[email protected]>
Stebalien added a commit that referenced this issue Jul 31, 2017
fixes #4102 (fixed in go-libp2p-swarm)

License: MIT
Signed-off-by: Steven Allen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress
Projects
None yet
3 participants