Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Race Condition #637

Merged
merged 7 commits into from
May 5, 2023
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 16 additions & 24 deletions src/mongocrypt.c
Original file line number Diff line number Diff line change
Expand Up @@ -597,36 +597,28 @@ static bool _validate_csfle_singleton(mongocrypt_t *crypt, _loaded_csfle found)
static void _csfle_drop_global_ref(void) {
mlib_call_once(&g_csfle_init_flag, init_csfle_state);

bool dropped_last_ref = false;
csfle_global_lib_state old_state = {.refcount = 0};
MONGOCRYPT_WITH_MUTEX(g_csfle_state.mtx) {
assert(g_csfle_state.refcount > 0);
int new_rc = --g_csfle_state.refcount;
if (new_rc == 0) {
old_state = g_csfle_state;
dropped_last_ref = true;
}
}

if (dropped_last_ref) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this boolean and copying the old state was to avoid doing this work within the protection of the mutex. If this is moved into that scope, then the code can be simplified to do the destruction directly without the state copying.

mongo_crypt_v1_status *status = old_state.vtable.status_create();
const int destroy_rc = old_state.vtable.lib_destroy(old_state.csfle_lib, status);
if (destroy_rc != MONGO_CRYPT_V1_SUCCESS && status) {
fprintf(stderr,
"csfle lib_destroy() failed: %s [Error %d, code %d]\n",
old_state.vtable.status_get_explanation(status),
old_state.vtable.status_get_error(status),
old_state.vtable.status_get_code(status));
}
old_state.vtable.status_destroy(status);

mongo_crypt_v1_status *status = g_csfle_state.vtable.status_create();
const int destroy_rc = g_csfle_state.vtable.lib_destroy(g_csfle_state.csfle_lib, status);
if (destroy_rc != MONGO_CRYPT_V1_SUCCESS && status) {
fprintf(stderr,
"csfle lib_destroy() failed: %s [Error %d, code %d]\n",
g_csfle_state.vtable.status_get_explanation(status),
g_csfle_state.vtable.status_get_error(status),
g_csfle_state.vtable.status_get_code(status));
}
g_csfle_state.vtable.status_destroy(status);
#ifndef __linux__
mcr_dll_close(old_state.dll);
mcr_dll_close(g_csfle_state.dll);
#endif
/// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in
/// the way ld-linux and GCC interact causes static destructors to not run
/// during dlclose(). Still, free the error string:
mstr_free(old_state.dll.error_string);
/// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in
/// the way ld-linux and GCC interact causes static destructors to not run
/// during dlclose(). Still, free the error string:
mstr_free(g_csfle_state.dll.error_string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mcr_dll_close call also frees g_csfle_state.dll.error_string. I think this needs to be in an #else:

#ifndef __linux__
            mcr_dll_close(g_csfle_state.dll);
#else
            /// NOTE: On Linux, skip closing the CSFLE library itself, since a bug in
            /// the way ld-linux and GCC interact causes static destructors to not run
            /// during dlclose(). Still, free the error string:
            mstr_free(g_csfle_state.dll.error_string);
#endif

I do not know why, but ASAN does not report this as a double free for me.

}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recall how this dlclose() bug manifested itself? If I get rid of the #ifndef __linux__ above and run it on Linux, I don't see any issues. I'm considering removing the #ifndef guard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very obscure painful bug that appears related to application shutdown. The crypt_shared library depends on global destructors, and the behavior of Linux dynamic libraries do not play nicely with C++ static lifetime semantics. The static destructors remain registered with atexit(), even if we unload the library, leading to crashes during shutdown. I wasn't able to replicate it myself, but downstream drivers ran into it almost immediately and this was the only reasonable fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! I'll restore the #ifndef __linux__ guard.

Copy link
Contributor

@vector-of-bool vector-of-bool May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a link to this ticket: https://jira.mongodb.org/browse/SERVER-63710

Correction: Not a crash, but faulty leak-detection warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please add a link to the ticket – I still think this is only showing on Linux because that’s where leak detection runs, not because Linux is doing something buggy.

}
}

Expand Down