-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Race Condition #637
Fix Race Condition #637
Conversation
|
||
if (dropped_last_ref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this boolean and copying the old state was to avoid doing this work within the protection of the mutex. If this is moved into that scope, then the code can be simplified to do the destruction directly without the state copying.
src/mongocrypt.c
Outdated
#ifndef __linux__ | ||
mcr_dll_close(old_state.dll); | ||
mcr_dll_close(g_csfle_state.dll); | ||
#endif | ||
/// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in | ||
/// the way ld-linux and GCC interact causes static destructors to not run | ||
/// during dlclose(). Still, free the error string: | ||
mstr_free(old_state.dll.error_string); | ||
/// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in | ||
/// the way ld-linux and GCC interact causes static destructors to not run | ||
/// during dlclose(). Still, free the error string: | ||
mstr_free(g_csfle_state.dll.error_string); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you recall how this dlclose()
bug manifested itself? If I get rid of the #ifndef __linux__
above and run it on Linux, I don't see any issues. I'm considering removing the #ifndef
guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very obscure painful bug that appears related to application shutdown. The crypt_shared
library depends on global destructors, and the behavior of Linux dynamic libraries do not play nicely with C++ static lifetime semantics. The static destructors remain registered with atexit(), even if we unload the library, leading to crashes during shutdown. I wasn't able to replicate it myself, but downstream drivers ran into it almost immediately and this was the only reasonable fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! I'll restore the #ifndef __linux__
guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a link to this ticket: https://jira.mongodb.org/browse/SERVER-63710
Correction: Not a crash, but faulty leak-detection warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please add a link to the ticket – I still think this is only showing on Linux because that’s where leak detection runs, not because Linux is doing something buggy.
This reverts commit f50e64e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a fix to a possible double-free.
/// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in | ||
/// the way ld-linux and GCC interact causes static destructors to not run | ||
/// during dlclose(). Still, free the error string: | ||
mstr_free(g_csfle_state.dll.error_string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mcr_dll_close
call also frees g_csfle_state.dll.error_string
. I think this needs to be in an #else
:
#ifndef __linux__
mcr_dll_close(g_csfle_state.dll);
#else
/// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in
/// the way ld-linux and GCC interact causes static destructors to not run
/// during dlclose(). Still, free the error string:
mstr_free(g_csfle_state.dll.error_string);
#endif
I do not know why, but ASAN does not report this as a double free for me.
Related: MONGOCRYPT-526 Fix race condition by moving critical section into the mutex protected scope. See the following repo for a tool to test the race condition fixed in this PR: https://github.com/kkloberdanz/libmongocrypt_stresstest
Related: MONGOCRYPT-526
Fix race condition by moving critical section into the mutex protected scope.
See the following repo for a tool to test the race condition fixed in this PR: https://github.com/kkloberdanz/libmongocrypt_stresstest