-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to fix the infamous DTLS decrypt alert error (issues #136 and #316) #394
Conversation
It'll be a while before I get an appropriate testing version out and have it stewing for long enough to see for sure, but I think this is the right approach (and this specific implementation looks good to me). Much appreciated 😄 |
Awesome, good find on the discussion, thanks for working on this Lorenzo |
I have just deployed it. We are planning a heavy load this week. I will let you know how it behaves. |
I have been running this for 110 hours and haven't seen the issue, so it's safe to say things improved, as I couldn't run it for much longer than 2-24 hours until handshakes started locking up with OpenSSL before this pull request Connections seem to fail less in general, though that might be placebo, people had to retry camming up sometimes but that seems less often now |
@LetsVape thanks for the quick feedback, and really excited about the results! |
👍 Thanks for that! Same here, I hope we can keep this process going and beat the BoringSSL uptime :) |
After three days of usage (heavier usage than usual) I have noticed no slow down, I have experienced no crash, I have not needed to restart the service in order for people to connect and there is no trace of the DTLS decrypt error in the logs. I would call this a huge success. Thanks a lot!!! |
Cool, thanks for the update, good to have other feedback that confirms this works! |
congratulations on the resolution! |
I got 29 days and 23 hours uptime out of this, I then got the following core dump, hope it tells you something useful @lminiero , otherwise I'll try again with a newer commit later, maybe the upcoming reference counters will improve things a bit 👍 https://gist.github.com/LetsVape/80880c1bdc92411e5018 Thanks! |
@LetsVape looks like some internal Glib stuff, not really sure how I can debug this (won't happen before a couple of weeks anyway). If it is some kind of race condition, then yes, a polished and stable version of the reference counters branch may help on that. ~30 days without an issue seems like a good improvement though 😄 |
Thanks for the reply, no worries at all, just to make you aware in case this backtrace was useful, as running it in AddressSanitizer for 30 days might be a bit much. Cheers and happy new year |
This is an attempt to fix the infamous issues #136 and #316, that is the "tlsv1 alert decrypt error" randomly happening when doing handshakes that force Janus to be restarted (issue affecting OpenSSL but NOT BoringSSL, apparently). The cause might be related to race conditions, and in fact OpenSSL docs state that:
The fix proposed here is heavily derived from a discussion related to RTPEngine, where it was mentioned the issue was fixed in this commit, which does indeed implement the callbacks the OpenSSL docs suggest.
Please test this estensively with OpenSSL (not BoringSSL) and let us know if this does fix it for you. Mentioning those who experienced the issue in the past in particular (at least those who discussed it on those issues pages), that is @ancorgs @ploxiln @LetsVape @Will-I4M @jacobhub @frk2 @amnonbb @Computician @jing3018 @tuijldert @saghul