Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix blocked DB::open on multiprocess access on exFAT filesystem #6959

Merged
merged 9 commits into from
Oct 20, 2023

Conversation

kiburtse
Copy link
Contributor

@kiburtse kiburtse commented Sep 7, 2023

What, How & Why?

Opening same realm with multiple processes on fat32 and exfat file systems may hang. Every open initializes multiple management files for interprocess communication. Truncate on open on exfat leads to the new inode number being assigned to the file, hence mixing internal cache of shared lock data within process. Avoid doing so in InterprocessMutex.

Fixes #6739

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
    * [ ] C-API, if public C++ API changed.

test/test_file.cpp Outdated Show resolved Hide resolved
// exFAT does not allocate a unique id for the file until it's non-empty
m_lock_info->m_file.resize(1);
m_fileuid = m_lock_info->m_file.get_unique_id();

REALM_ASSERT_RELEASE(s_info_map->find(m_fileuid) == s_info_map->end());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some other places where this could be a problem? We do use open with truncate in a few other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't found anything anywhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the fix in #3883 was insufficient and that lines 267-268 are now unnecessary. Can you add some stronger checks on an InterprocessMutex to make sure that its file handle is allocated correctly on open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it is so? Other fix was targeting different issue but essentially dealing with the same peculiarity.

'Append' mode doesn't resize newly created file (it just avoids truncating it on open and ultimately making uid available for other files). It's still needed to resize file to 1 to get real uid.

Which stronger checks do you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I misunderstood what the problem was here.
I was wondering if it might be possible to come up with a deterministic test case using InterprocessMutex and the SpawnedProcess test helper. But unless you can come up with something creative, I think it is just inherently racy. I suppose our existing multiprocess tests are actually hitting this and causing the hang which is reassuring, so technically it is covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i've looked at it a bit again and added somewhat reliable test using just InterprocessMutex and spawn_process. Seems like we don't really have something like this at all. Should be beneficial. The logic is similar with how this is used in DB and it mimics almost the same pattern as with multiprocess tests from lang bindings test cases. It's still racy but somewhat reliably hits added assertions for me on exfat sdcard on macos, but needs a lot of iterations (>1000 sometimes) which takes tens of seconds. I've put just 10 to keep added time to milliseconds. If it hits an assertion or hangs on some run, we should be better equipped in the future to investigate as this time around. Would it be fine to merge?

@kiburtse kiburtse marked this pull request as ready for review October 12, 2023 20:33
test/test_file.cpp Show resolved Hide resolved
// exFAT does not allocate a unique id for the file until it's non-empty
m_lock_info->m_file.resize(1);
m_fileuid = m_lock_info->m_file.get_unique_id();

REALM_ASSERT_RELEASE(s_info_map->find(m_fileuid) == s_info_map->end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the fix in #3883 was insufficient and that lines 267-268 are now unnecessary. Can you add some stronger checks on an InterprocessMutex to make sure that its file handle is allocated correctly on open?

@coveralls-official
Copy link

coveralls-official bot commented Oct 12, 2023

Pull Request Test Coverage Report for Build kirill.burtsev_95

  • 59 of 82 (71.95%) changed or added relevant lines in 5 files are covered.
  • 51 unchanged lines in 10 files lost coverage.
  • Overall coverage increased (+0.01%) to 91.605%

Changes Missing Coverage Covered Lines Changed/Added Lines %
test/test_file.cpp 9 13 69.23%
test/util/test_path.cpp 5 9 55.56%
test/test_shared.cpp 39 54 72.22%
Files with Coverage Reduction New Missed Lines %
test/util/test_path.cpp 1 79.77%
src/realm/array_string.cpp 2 87.78%
src/realm/table_view.cpp 2 94.18%
src/realm/sync/client.cpp 3 90.75%
src/realm/query_expression.hpp 4 93.43%
src/realm/sync/network/websocket.cpp 4 75.26%
src/realm/sync/network/network.cpp 6 89.86%
src/realm/sync/noinst/server/server.cpp 6 76.41%
test/fuzz_group.cpp 10 54.68%
src/realm/sync/network/default_socket.cpp 13 76.24%
Totals Coverage Status
Change from base Build 1752: 0.01%
Covered Lines: 230632
Relevant Lines: 251769

💛 - Coveralls

src/realm/util/interprocess_mutex.hpp Outdated Show resolved Hide resolved
// exFAT does not allocate a unique id for the file until it's non-empty
m_lock_info->m_file.resize(1);
m_fileuid = m_lock_info->m_file.get_unique_id();

REALM_ASSERT_RELEASE(s_info_map->find(m_fileuid) == s_info_map->end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I misunderstood what the problem was here.
I was wondering if it might be possible to come up with a deterministic test case using InterprocessMutex and the SpawnedProcess test helper. But unless you can come up with something creative, I think it is just inherently racy. I suppose our existing multiprocess tests are actually hitting this and causing the hang which is reassuring, so technically it is covered.

Copy link
Contributor

@finnschiermer finnschiermer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little thing with the test :-), otherwise looking good.

spawned.emplace_back(
test_util::spawn_process(test_context.test_details.test_name, util::format("child [%1]", i)));

if (spawned.back()->is_child()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it timing dependent what spawned.back() refers to when it executes in a child while the parent is still spawning other children?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't test_util::spawn_process essentially no-op in child process? It checks if REALM_CHILD_IDENT is present in env for a process and returns uninitialized SpawnedProcess. So first iteration in child process should get it, check the same thing again, execute and quit. That's the pattern in other test from what i can tell. May be i don't understand something here, but how parent process would influence this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just (mis)read the spawn_process as a fork() system call...

Copy link
Contributor

@finnschiermer finnschiermer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kiburtse kiburtse merged commit 7820ded into master Oct 20, 2023
2 checks passed
@kiburtse kiburtse deleted the kb/fix-interprocess-lock-exfat-multiprocess branch October 20, 2023 08:58
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TSAN theLangBindHelper_ImplicitTransactions_InterProcess failure
3 participants