-
-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File descriptor (socket) leak #126
Comments
Thanks for reporting this issue. Based on your information, I just used Valgrind on data-channels-flow-control example, and it detected at least two Leaks, including: I will fix these leaks ASAP |
I was thinking of checking Arc usage too. In my another probject (using GStreamer for WebRTC), Arc downgrade/upgrade solves the cycle problem. |
I think I have made some progress on fixing reference cycle caused memory leak. reflect and data-channels-flow-control in examples v0.3.0 branch should be memory leak free now. I have checked by using nightly rust version with following build It reports no memory leak. I will keep checking and fixing other examples to make sure there is no reference cycle caused memory leak |
Unfortunately, I don't think the file descriptor leak is fixed yet (on master branch). I am going to look further into it on Monday |
All examples in v0.3.0 branch of examples repo passed rc-cycle check. Please note that it requires some changes on callback function: for example, here we have to use Weak pointer of peer_connection in its on_track callback, otherwise, it will cause reference cycle leak.
Could you check your code whether have such callback? |
Steps to reproduce:
if let Some(conn) = &self.conn {
let _ = conn.close().await;
use std::sync::Weak;
use std::time::Duration;
let conn_weak = Arc::downgrade(conn);
tokio::spawn(async move {
let mut int = tokio::time::interval(Duration::from_secs(1));
loop {
int.tick().await;
dbg!(Weak::weak_count(&conn_weak));
if dbg!(Weak::strong_count(&conn_weak)) == 0 {
break;
}
}
});
}
I reproduced using ~50 clients. |
@qbx2, I tried to reproduce this issue in ice crate with ping_pong example. From new ping_pong example, ice crate looks like not root cause for such leak. Therefore, I think the leak may come from webrtc crate itself, so I fixed one possible issue in 0944adf, where endpoint forgets to call underlying conn.close(). Could you try again? In addition, in order to reproduce this issue easily, do you think any example in https://github.com/webrtc-rs/examples that can be hacked to reproduce this issue? With the same codebase, it is more easy to debug. Thank you. |
I tried webrtc on master branch, but had no luck. :( |
@rainliu Please, check this out: qbx2/webrtc-rs-examples@5f5c3a5 |
@qbx2, thanks for this example, I can reproduce this issue. |
@qbx2, I think I found the root cause of the bug. e7d72d2 should fix this issue. The root cause is that cancel_tx signal should be set to internal.cancel_tx before call agent.dial or agent.accept, otherwise, calling agent.close() will have no-op on cancellation of dial/accept.
in your example, you still need to send done signal to call peer_connection.close().await, otherwise, it won't call self.internal.ice_transport.stop().await, which is used to cancel agent.dial/accept.
|
By the way, in your 50 clients call case, does each client have one peer_connection? If yes, once the following callback has indicated Failed, you may need to close the peer_connection for such client. peer_connection |
You're right. The reason that I commented that line out was to keep the process alive so that the leak could be confirmed. Thanks a lot for the advice, I am caring of closing peer_connection.
Yes, they have. However, peer_connections are designed to be dropped when it failed to connect (I use Weak). It has no problem, right? |
thanks for confirm. I just released v0.3.1 to include this fix. |
Sockets opened in ICEGatherer seem to be leaked. They will never be closed even when the PeerConnections are closed. Finally, my server becomes unavailable with
too many open files
error.I've investigated into webrtc's source code, and found that
CandidateBase.close()
is an no-op. Of course, it is because tokio's UdpSocket does not provideclose()
. Despite all, the sockets should be closed when it is dropped. Therefore, I guess that the socket is not dropped.RTCPeerConnection
holdsArc<PeerConnectionInternal>
and there are other holders too. In v0.2.1, the peerconnection has no problem, dropped well. However, other holders seems not to drop it. I have no idea where the others are, butinternal_rtcp_writer
(peer_connection.rs, line 170) orpci
(peer_connection.rs, line 1163, 1345, 1393) may have created reference cycle.If possible, those references should be replaced by
std::sync::Weak
to break the cycle. Pion or other webrtc libraries written in garbage-collected languages may not suffer from this issue because the GC can detect and free those circular references well. Because Rust does not have one, we should use weak references to avoid this problem. It will also fix other memory leaks too.The text was updated successfully, but these errors were encountered: