Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many open files - ipfs cluster unusable #4339

Closed
agriffaut opened this issue Oct 24, 2017 · 4 comments
Closed

too many open files - ipfs cluster unusable #4339

agriffaut opened this issue Oct 24, 2017 · 4 comments

Comments

@agriffaut
Copy link

Version information:

go-ipfs version: 0.4.11-
Repo version: 6
System version: amd64/linux
Golang version: go1.9

ipfs-cluster-service version 0.1.0

Type:

Bug

Severity:

High

Related Issue

Seems related to #4102 which is closed and corrected in v0.4.11

Description:

I recently setup a small cluster of 3 ipfs nodes, for testing purpose, using binaries downloaded from https://dist.ipfs.io
The cluster succesfully start, but peers connection grow until saturation

sockets: 2036
leveldb: 5
flatfs: 1

Some logs from ipfs showing "too many open files" errors:

{"event":"tptDialReusePortBegin","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.523767592Z"}
{"duration":52725,"error":"too many open files","event":"tptDialReusePort","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.523818874Z"}
{"dial":"failure","duration":871388,"encrypted":true,"error":"dial tcp4 85.117.122.236:19383: socket: too many open files","event":"connDial","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip4/85.117.122.236/tcp/19383","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.523923809Z"}
{"encrypted":true,"event":"connDialBegin","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip4/127.0.0.1/tcp/3001","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.524054794Z"}
{"event":"tptDialReusePortBegin","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.524289849Z"}
{"duration":39437,"error":"too many open files","event":"tptDialReusePort","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.524329224Z"}
{"dial":"failure","duration":632198,"encrypted":true,"error":"dial tcp4 127.0.0.1:3001: socket: too many open files","event":"connDial","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip4/127.0.0.1/tcp/3001","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.524679547Z"}
{"encrypted":true,"event":"connDialBegin","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip6/2001:0:4137:9e76:3479:3f98:aa8a:8513/tcp/3001","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.52524456Z"}
{"event":"tptDialReusePortBegin","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.525617419Z"}
{"duration":358018,"error":"too many open files","event":"tptDialReusePort","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.525700382Z"}
{"dial":"failure","duration":679325,"encrypted":true,"error":"dial tcp6 [2001:0:4137:9e76:3479:3f98:aa8a:8513]:3001: socket: too many open files","event":"connDial","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip6/2001:0:4137:9e76:3479:3f98:aa8a:8513/tcp/3001","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.525833283Z"}
{"encrypted":true,"event":"connDialBegin","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip4/85.117.122.236/tcp/21032","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.526042995Z"}
{"event":"tptDialReusePortBegin","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.526607868Z"}
{"duration":45164,"error":"too many open files","event":"tptDialReusePort","raddr":{},"system":"tcp-tpt","time":"2017-10-24T11:56:36.526652453Z"}
{"dial":"failure","duration":970578,"encrypted":true,"error":"dial tcp4 85.117.122.236:21032: socket: too many open files","event":"connDial","inPrivNet":false,"localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remoteAddr":"/ip4/85.117.122.236/tcp/21032","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"conn","system":"conn","time":"2017-10-24T11:56:36.526985828Z"}
{"dial":"failure","duration":10288081,"encrypted":true,"error":"\u003cpeer.ID QfxQ94\u003e --\u003e \u003cpeer.ID XGtyhD\u003e dial attempt failed: dial tcp4 85.117.122.236:21032: socket: too many open files","event":"swarmDialDo","localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"swarm","system":"swarm2","time":"2017-10-24T11:56:36.527091498Z"}
{"event":"swarmDialBackoffAdd","localPeer":"QmQfxQ94bgESRobzBdGWzajbPdNSNinJ9YnQLdxeAiPL2j","remotePeer":"QmXGtyhDfBhv44Y97MebptXSvHoQjaeAeUfVnZ6xwx4zjL","subsystem":"swarm","system":"swarm2","time":"2017-10-24T11:56:36.527142088Z"}
@hsanjuan
Copy link
Contributor

Hi, how long does it take for this to happen? Does it happen by itself by just launching the ipfs nodes - without ipfs-cluster or is something triggering it?

@whyrusleeping
Copy link
Member

@agriffaut could you try this with go-ipfs built from latest master? We recently merged something that should resolve the issues you are seeing.

@agriffaut
Copy link
Author

go-ipfs version: 0.4.12-dev-c0d6224

@whyrusleeping I've reinstalled my little cluster with the latest master, and after 20min, open sockets are around 750.. Better than before !
I'll play with it tomorow and let you know about it..
Thx

@Stebalien
Copy link
Member

By default, we've set the connection limit to ~900. This isn't a hard limit but we start killing connections when we hit this limit until we get down to ~600 connections (and then wait till we get back to 900).

Relevant PR: #4324

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants