Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Too many files open #92

Closed
bretrouse opened this issue Aug 20, 2013 · 4 comments
Closed

Error: Too many files open #92

bretrouse opened this issue Aug 20, 2013 · 4 comments

Comments

@bretrouse
Copy link

Hello,

When using locust with 1 master and 4 slaves, running a 50,000 users at 200 hatched per second I'm receiving the following error:

'ConnectionError(MaxRetryError("HTTPConnectionPool(host='rewresnwww6ld', port=80): Max retries exceeded with url: /api/activities (Caused by <class 'socket.error'>: [Errno 24] Too many open files)",),)'

This seems to be coming from the requests library. My ulimit is unlimited and I've applied the other settings below from this post:

echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240

Any ideas? I can't effectively loadtest at this point as the error rate climbs after ~5000 users have been generated.

@cgbystrom
Copy link
Member

It is likely your sockets that end up in TIME_WAIT state, which effectively blocks them for re-use for a temporary time period.

See http://serverfault.com/questions/212093/how-to-reduce-number-of-sockets-in-time-wait and http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/ for more info.

One could argue that Locust should re-use sockets when doing big tests. We've been thinking about that for when testing Battlelog (multi-million user tests) to reduce this behavior, since it isn't optimal to re-use sockets too quickly (hence the default TIME_WAIT timeout). However, reusing sockets won't test the actual TCP accept handshake which also puts stress on your system. But in most cases, this isn't your actual bottleneck anyway.

@Jahaja
Copy link
Member

Jahaja commented Aug 20, 2013

This should not be a case of TCP port exhaustion as that would not generate that error. (Rather it would generate EAGAIN on connect())

I think it's more likely that your python processess don't actually have the intended resource limit. You could confirm this by printing it out in your locustfile.

import resource
print resource.getrlimit(resource.RLIMIT_NOFILE)

However, reusing sockets won't test the actual TCP accept handshake which also puts stress on your system. But in most cases, this isn't your actual bottleneck anyway.

I think it would actually. The peer would most likely be gone and it would be required to reestablish that connection from scratch. The only thing that would be reused is probably the kernel resources allocated for that socket. That said, I'd imagine reusing the sockets could create quite strange errors on a shaky network.

@bretrouse
Copy link
Author

That appears to have been the issue. Once I added this call to my locustfile I was able to bring my servers down. Thanks for the help. Still unsure why the python process wasn't respecting my ulimit settings, but able to work around it for now.

resource.setrlimit(resource.RLIMIT_NOFILE, (999999, 999999))

Thanks guys.

@thehackercat
Copy link

Firstly check if the the socket is close. Python socket should call socket.close() after socket.shutdown(2), then the connection will be delocalized and released.

Then enlarge the maximum open files in /etc/security/limits.comf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants