-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix user tokens not being threadsafe #4944
Conversation
This could manifest in a segfault if the access token refresh handler was triggered at a specific time while the user is logged out from a different thread. Specifically, resetting the refresh jwt at the same time that it was being copied out by `SyncUser::refresh_jwt()` and the util::Optional<BsonDocument> was read as being valid but it was just set to util::none, and now the thread requesting the jwt triggers a copy of invalid random data.
return do_is_logged_in(lock); | ||
} | ||
|
||
bool SyncUser::do_is_logged_in(std::unique_lock<std::mutex>& lock) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we aren't going to mutate the lock, we can take it by const ref.
@@ -362,7 +362,13 @@ void SyncUser::log_out() | |||
|
|||
bool SyncUser::is_logged_in() const | |||
{ | |||
std::lock_guard<std::mutex> lock(m_mutex); | |||
std::unique_lock<std::mutex> lock(m_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change from a lock_guard to a unique_lock? It doesn't look like we unlock it anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to assert that the lock was held inside of do_is_logged_in()
and a simple lock_guard doesn't provide that ability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'd probably be a good spot to use the stuff in checked_mutex.hpp, which would let you statically verify that everything which calls do_is_logged_in()
acquires the mutex first.
test/object-store/sync/user.cpp
Outdated
|
||
using namespace std::chrono_literals; | ||
std::chrono::system_clock::time_point now = std::chrono::system_clock::now(); | ||
int64_t valid_time = std::chrono::system_clock::to_time_t(now + 30min); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either auto or time_t valid_time - time_t's underlying type is implementation defined.
{ | ||
using namespace std::chrono; | ||
std::lock_guard<std::mutex> guard(m_mutex); | ||
return m_refresh_token.expires_at < duration_cast<seconds>(system_clock::now().time_since_epoch()).count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check whether the user is logged in here at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/object-store/sync/user.cpp
Outdated
constexpr size_t num_iterations = 1000; | ||
auto shared_code = [&]() { | ||
for (size_t i = 0; i < num_iterations; ++i) { | ||
bool should_refresh = user->access_token_refresh_required(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify - the race is inside this function because we read from the access token outside of the lock, and this races with the other thread because it's constantly updating the tokens?
How often did this test fail without the fix in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I double checked and this test does not actually trigger the crash. It does exercise the race, but the actual crash was due to copying the JWT via SyncUser::refresh_jwt()
while it was being mutated by another thread and now that I've remove that method entirely it's kinda impossible to trigger. Since this was caught by CI via an existing test I now think it will be best to remove this test and rely on existing coverage since it's technically covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better long term solution is to investigate the TSAN failures in the object store tests and get that running on CI.
Alright, I've removed the inert test and followed @tgoyne's recommendation to add the |
* Fix user tokens not being threadsafe This could manifest in a segfault if the access token refresh handler was triggered at a specific time while the user is logged out from a different thread. Specifically, resetting the refresh jwt at the same time that it was being copied out by `SyncUser::refresh_jwt()` and the util::Optional<BsonDocument> was read as being valid but it was just set to util::none, and now the thread requesting the jwt triggers a copy of invalid random data. * use static checking for protected SyncUser members
This could manifest in a segfault if the access token refresh handler
was triggered at a specific time while the user is logged out from a
different thread. Specifically, resetting the refresh jwt at the same
time that it was being copied out by
SyncUser::refresh_jwt()
andthe util::Optional was read as being valid but it was just
set to util::none, and now the thread requesting the jwt triggers a copy
of invalid random data.
This was discovered by our CI which I was then able to reproduce by running the test
app: app destroyed during token refresh
many times locally.☑️ ToDos