Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files strikes back #262

Closed
Mygod opened this issue May 23, 2020 · 22 comments
Closed

Too many open files strikes back #262

Mygod opened this issue May 23, 2020 · 22 comments

Comments

@Mygod
Copy link
Contributor

Mygod commented May 23, 2020

Similar to #106, I am seeing a lot of errors related to too many open files in shadowsocks-android, more frequently in Xiaomi phones. Probably have to do with UDP association.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 23, 2020

Related to #252 .

{
    "udp_timeout": 5,  // Idling UDP associations will be kept only 5 seconds
    "udp_max_associations": 512,  // Maximum UDP associations to be kept, unlimited by default
}

shadowsocks-libev limits UDP associations to 512 on server, 256 on client.

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

Yes. Using max associations sounds like a hack instead of a proper fix though.

@zonyitoo
Copy link
Collaborator

Maybe we can set udp_max_associations' default value to be 1/3 of rlimit.

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

I think there is something else wrong. Do we handle closing udp sockets correctly? (I haven't checked but it might be the case)

@zonyitoo
Copy link
Collaborator

It's a problem. But it should be handled correctly. UDP's sockets are actually stored inside a tokio's task, which will be killed when ProxyAssociation is dropped.

BTW, #259 is probably not a good idea, which is causing timed out associations couldn't be released in time.

@zonyitoo
Copy link
Collaborator

For reference, you should see DEBUG logs like:

UDP association xxx <- ... task is closing

when associations are dropped.

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

Connections should be dropped when new connection comes in. I don't think that is an issue. Also it is only a server side change.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 23, 2020

It seems that getrlimit won't work on Android. (from search results in Google).

So it is not possible to set a default proper size limit for udp associations. right?

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

Where did you find that getrlimit does not work?

Either way, maybe there are two three things we can do:

  1. Drop associations when we get EMFILE (however, this error can arise anywhere in the code).
  2. (Unix specific?) If UDP request originates from localhost (or in Android's VPN mode, a local client connects to the tun interface), we can determine when the UDP connection has gone away by reading /proc/net/udp (or ConnectivityManager.getConnectionOwnerUid for Android 10+).
  3. (local-local/server-local only) Drop UDP sockets actively when we receive ICMP unreachable messages. However, this would require a recvmsg syscall which is neither supported by std nor tokio. Handling ICMP messages in UdpSocket::recv/recv_from rust-lang/rfcs#2931

Finally using SOCKS5 UDP ASSOCIATE would help too, but I don't think anyone is connecting to sslocal UDP directly (i.e. without using some tun2socks).

@zonyitoo
Copy link
Collaborator

Drops association when UDP ASSOCIATE's TCP connection is closed?

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

Yes but it requires the application to support this UDP ASSOCIATE command as well, in particular, tun interfaces operates on IP level so it will not know when UDP socket is closed either. An alternative is to use other kernel API like /proc/net/udp.

On the other hand, maybe if we use UdpSocket::connect it will properly return ECONNREFUSED when we send/recv? That would be useful too.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 23, 2020

On the other hand, maybe if we use UdpSocket::connect it will properly return ECONNREFUSED when we send/recv? That would be useful too.

This is the most prefered solution.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 23, 2020

let ret = WSAIoctl(
handle,
SIO_UDP_CONNRESET,
&mut enable as *mut _ as LPVOID,
mem::size_of_val(&enable) as DWORD,
ptr::null_mut(),
0,
&mut bytes_returned as *mut _ as LPDWORD,
ptr::null_mut(),
None,
);

ICMP port unreachable errors are closed on purpose.

On Windows, ICMP port unreachable errors will cause sockets unusable for all subsequence calls send and recv.

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

However, we only use UdpSocket::connect for sslocal to ssserver connections. Every other UDP socket needs to communicate with multiple clients so it will not help much, i.e. we can only use it to close connections when ssserver is unreachable (e.g. due to connectivity change). Other sockets still need recvmsg to determine which client has gone away.

@zonyitoo
Copy link
Collaborator

It is ok to add a customized function to call recvmsg, as I have already done for *-redir implementations.

@Mygod
Copy link
Contributor Author

Mygod commented May 23, 2020

We need to integrate it with mio too to make it async.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 23, 2020

fn poll_recv_from(
&self,
cx: &mut Context<'_>,
buf: &mut [u8],
) -> Poll<io::Result<(usize, SocketAddr, SocketAddr)>> {
ready!(self.io.poll_read_ready(cx, mio::Ready::readable()))?;
match recv_from_with_destination(self.io.get_ref(), buf) {
Err(ref e) if e.kind() == ErrorKind::WouldBlock => {
self.io.clear_read_ready(cx, mio::Ready::readable())?;
Poll::Pending
}
x => Poll::Ready(x),
}
}
}
#[async_trait]
impl UdpSocketRedirExt for UdpRedirSocket {
async fn recv_from_redir(&mut self, buf: &mut [u8]) -> io::Result<(usize, SocketAddr, SocketAddr)> {
poll_fn(|cx| self.poll_recv_from(cx, buf)).await
}
}

No you don't need mio.

BTW, please fix #265 first. @Mygod

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 24, 2020

Tests on macOS:

nc -> sslocal -> ssserver -> nc

Settings: udp_timeout: 5, udp_max_associations: 10.

After the first packet, both sslocal and ssserver have created new fds:

lsof -p ${sslocal_pid}:

sslocal 46977 zonyitoo   13u    IPv4 0xe8869e9b57cb177b       0t0                 UDP localhost:61627->localhost:8288
sslocal 46977 zonyitoo   14u    IPv4 0xe8869e9b57cb0bdb       0t0                 UDP *:52299

One for communicating with ssserver, and the other one is for bypassing targets.

lsof -p ${ssserver_pid}:

ssserver 46453 zonyitoo   13u    IPv4 0xe8869e9b6368131b      0t0                 UDP *:49907

After 5 seconds, all of these newly created fds are released.

So I think there is no fd leaks in UDP association implementation.

@zonyitoo
Copy link
Collaborator

Maybe default timeout of udp associations shouldn't be 5 minutes, which is too long.

UDP associations are used mostly for sending one packet.

@Mygod
Copy link
Contributor Author

Mygod commented May 24, 2020

UDP sockets can usually send more than one packets... Our implementation of UDP DNS is pretty wasteful. On the other hand, I suspect the issue on those phones are erroneous implementations of UPnP (or something similar) creating a lot of short-lived sessions.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 24, 2020

I have tested with v2ray-plugin's QUIC mode:

CLIENT -> sslocal1(v2ray-plugin) -> sslocal2(tunnel) -> ssserver2(tunnel)  -> ssserver1(v2ray-plugin) -> REMOTE

And I saw both sslocal2 and ssserver2 kept only 1 association all the time.

EDIT: it seems that QUIC won't just keep 1 socket all the time. But the number of sockets are very limited.

@zonyitoo
Copy link
Collaborator

Reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants