-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread and memory safety fixes #562
Conversation
… earlier to fix accessing freed memory as reported by valgrind
@@ -63,6 +63,7 @@ namespace Channel | |||
virtual ~ChannelSocket(); | |||
|
|||
public: | |||
void Close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? We have a destructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was necessary to close the socket, but not destroy it (see Worker::Close()
method changes). Otherwise libuv's callbacks were leading to access of consumer/producer sockets that were already freed, thus causing use-after-free I mentioned.
So I had to add explicit Close()
method that closes producer and consumer sockets using their Close()
methods, while keeping everything in memory, so things like IsClosed()
in consumer callback still access valid memory location and returns true
instead of undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we call Worker::Close()
explicitly in a few places, while libuv is fully running and can still trigger various callbacks on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the diff in Worker::Close()
in this PR:
// Close the Channel.
- delete this->channel;
+ this->channel->Close();
and ChannelSocket::Close()
looks as follows:
void ChannelSocket::Close()
{
MS_TRACE_STD();
this->consumerSocket.Close();
this->producerSocket.Close();
}
So what I mean is: why don't we just delete this->channel
in the Worker
destructor and make ChannelSocket
do this?
void ChannelSocket::Close()
{
MS_TRACE_STD();
std::free(this->WriteBuffer);
this->consumerSocket.Close();
this->producerSocket.Close();
}
I don't like having a Close()
method if we can live with just the destructor. We avoided Close()
methods in the past for a reason: they are error prune: you call Close()
on something and then what? can you reuse it? should you also call delete instance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand we need to close consumer and producer sockets explicitly or else libuv will not stop event loop. Hence we do need to have explicit closing procedure for worker, namely Worker::Close()
. It is just that freeing memory for things that could still be accessed is wrong.
I do not know if delete this->channel
is still necessary in worker's destructor with these changes, I though it will be freed automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the worker crashed in Node even when terminating the Node process, I'd have my ~/Library/Logs/DiagnosticReports/
folder full of mediasoup-worker core dumps :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the shutdown order is somehow slightly different between Rust and Node versions. Node uses SIGTERM
and Rust uses worker.close
command, which are slightly different code paths.
Anyway, what I did seems to work, once you have time to take a closer look, let me know what needs to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the shutdown order is somehow slightly different between Rust and Node versions. Node uses
SIGTERM
and Rust usesworker.close
command, which are slightly different code paths.
This is what I wanted to read!
Ok, true, it makes sense. I just need to check the code very carefully (it's been long since all that logic was done).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you may have time to look at this in coming days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, HYPER busy this week. I'll do tomorrow Friday at some random hour. Promised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Just one question done in comments before approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BTW I've pushed this to v3 branch: b13b5d5 (it seems to not affect this PR). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, let's merge and release
So is this ready to run in macOS Rust once this is merged? |
Almost, there is still one more batch of additional changes that I will submit soon. |
ok ok, then I won't release yet. |
First commit adds some more
thread_local
and other minor tweaks to make sure things don't go wrong when multiple workers are running in different threads concurrently, I believe all cases should now be covered.Next 2 commits fix some interesting use-after-free memory issues that were reported by
valgrind
. They weren't causing crashes under Linux, at least most of the time, but were crashing on macOS most of the time. Now things should work well, but please look carefully because I don't know what I'm doing when it comes to C++ 😅This is the last prerequisite to introduction of macOS support in Rust version, which will be a separate PR.