Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix user tokens not being threadsafe #4944

Merged
merged 4 commits into from
Oct 12, 2021
Merged

Fix user tokens not being threadsafe #4944

merged 4 commits into from
Oct 12, 2021

Conversation

ironage
Copy link
Contributor

@ironage ironage commented Oct 7, 2021

This could manifest in a segfault if the access token refresh handler
was triggered at a specific time while the user is logged out from a
different thread. Specifically, resetting the refresh jwt at the same
time that it was being copied out by SyncUser::refresh_jwt() and
the util::Optional was read as being valid but it was just
set to util::none, and now the thread requesting the jwt triggers a copy
of invalid random data.

This was discovered by our CI which I was then able to reproduce by running the test app: app destroyed during token refresh many times locally.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)

This could manifest in a segfault if the access token refresh handler
was triggered at a specific time while the user is logged out from a
different thread. Specifically, resetting the refresh jwt at the same
time that it was being copied out by `SyncUser::refresh_jwt()` and
the util::Optional<BsonDocument> was read as being valid but it was just
set to util::none, and now the thread requesting the jwt triggers a copy
of invalid random data.
@ironage ironage requested review from jbreams and jedelbo October 7, 2021 01:35
@ironage ironage self-assigned this Oct 7, 2021
@ironage
Copy link
Contributor Author

ironage commented Oct 7, 2021

Screen Shot 2021-10-06 at 5 16 58 PM

Stack trace of the crash that this fixes for kicks.

return do_is_logged_in(lock);
}

bool SyncUser::do_is_logged_in(std::unique_lock<std::mutex>& lock) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't going to mutate the lock, we can take it by const ref.

@@ -362,7 +362,13 @@ void SyncUser::log_out()

bool SyncUser::is_logged_in() const
{
std::lock_guard<std::mutex> lock(m_mutex);
std::unique_lock<std::mutex> lock(m_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change from a lock_guard to a unique_lock? It doesn't look like we unlock it anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to assert that the lock was held inside of do_is_logged_in() and a simple lock_guard doesn't provide that ability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'd probably be a good spot to use the stuff in checked_mutex.hpp, which would let you statically verify that everything which calls do_is_logged_in() acquires the mutex first.


using namespace std::chrono_literals;
std::chrono::system_clock::time_point now = std::chrono::system_clock::now();
int64_t valid_time = std::chrono::system_clock::to_time_t(now + 30min);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either auto or time_t valid_time - time_t's underlying type is implementation defined.

{
using namespace std::chrono;
std::lock_guard<std::mutex> guard(m_mutex);
return m_refresh_token.expires_at < duration_cast<seconds>(system_clock::now().time_since_epoch()).count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check whether the user is logged in here at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the intention of the place this is called, no. Honestly this should all change in the near future given #4882 and #4943 because checking the refresh token expiry time locally is dubious. But I wanted to just fix the crash in this change.

constexpr size_t num_iterations = 1000;
auto shared_code = [&]() {
for (size_t i = 0; i < num_iterations; ++i) {
bool should_refresh = user->access_token_refresh_required();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify - the race is inside this function because we read from the access token outside of the lock, and this races with the other thread because it's constantly updating the tokens?

How often did this test fail without the fix in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I double checked and this test does not actually trigger the crash. It does exercise the race, but the actual crash was due to copying the JWT via SyncUser::refresh_jwt() while it was being mutated by another thread and now that I've remove that method entirely it's kinda impossible to trigger. Since this was caught by CI via an existing test I now think it will be best to remove this test and rely on existing coverage since it's technically covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better long term solution is to investigate the TSAN failures in the object store tests and get that running on CI.

@cla-bot cla-bot bot added the cla: yes label Oct 8, 2021
@ironage
Copy link
Contributor Author

ironage commented Oct 8, 2021

Alright, I've removed the inert test and followed @tgoyne's recommendation to add the CheckedMutex to statically guard the SyncUser tokens. If we had this in place previously, we would have been alerted about the race, so I think this is sufficient in lieu of an actual unit test.

@ironage ironage requested review from jbreams and tgoyne October 8, 2021 00:28
@jedelbo jedelbo removed their request for review October 12, 2021 07:26
@ironage ironage merged commit 2f7ad33 into master Oct 12, 2021
@ironage ironage deleted the js/guard-user-tokens branch October 12, 2021 21:25
dianaafanador3 pushed a commit that referenced this pull request Oct 27, 2021
* Fix user tokens not being threadsafe

This could manifest in a segfault if the access token refresh handler
was triggered at a specific time while the user is logged out from a
different thread. Specifically, resetting the refresh jwt at the same
time that it was being copied out by `SyncUser::refresh_jwt()` and
the util::Optional<BsonDocument> was read as being valid but it was just
set to util::none, and now the thread requesting the jwt triggers a copy
of invalid random data.

* use static checking for protected SyncUser members
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants