-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix blocked DB::open on multiprocess access on exFAT filesystem #6959
Conversation
realm access on fat32 and exfat file systems.
// exFAT does not allocate a unique id for the file until it's non-empty | ||
m_lock_info->m_file.resize(1); | ||
m_fileuid = m_lock_info->m_file.get_unique_id(); | ||
|
||
REALM_ASSERT_RELEASE(s_info_map->find(m_fileuid) == s_info_map->end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there some other places where this could be a problem? We do use open with truncate in a few other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't found anything anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that the fix in #3883 was insufficient and that lines 267-268 are now unnecessary. Can you add some stronger checks on an InterprocessMutex to make sure that its file handle is allocated correctly on open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it is so? Other fix was targeting different issue but essentially dealing with the same peculiarity.
'Append' mode doesn't resize newly created file (it just avoids truncating it on open and ultimately making uid available for other files). It's still needed to resize file to 1 to get real uid.
Which stronger checks do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I misunderstood what the problem was here.
I was wondering if it might be possible to come up with a deterministic test case using InterprocessMutex and the SpawnedProcess test helper. But unless you can come up with something creative, I think it is just inherently racy. I suppose our existing multiprocess tests are actually hitting this and causing the hang which is reassuring, so technically it is covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i've looked at it a bit again and added somewhat reliable test using just InterprocessMutex and spawn_process. Seems like we don't really have something like this at all. Should be beneficial. The logic is similar with how this is used in DB and it mimics almost the same pattern as with multiprocess tests from lang bindings test cases. It's still racy but somewhat reliably hits added assertions for me on exfat sdcard on macos, but needs a lot of iterations (>1000 sometimes) which takes tens of seconds. I've put just 10 to keep added time to milliseconds. If it hits an assertion or hangs on some run, we should be better equipped in the future to investigate as this time around. Would it be fine to merge?
// exFAT does not allocate a unique id for the file until it's non-empty | ||
m_lock_info->m_file.resize(1); | ||
m_fileuid = m_lock_info->m_file.get_unique_id(); | ||
|
||
REALM_ASSERT_RELEASE(s_info_map->find(m_fileuid) == s_info_map->end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that the fix in #3883 was insufficient and that lines 267-268 are now unnecessary. Can you add some stronger checks on an InterprocessMutex to make sure that its file handle is allocated correctly on open?
Pull Request Test Coverage Report for Build kirill.burtsev_95
💛 - Coveralls |
// exFAT does not allocate a unique id for the file until it's non-empty | ||
m_lock_info->m_file.resize(1); | ||
m_fileuid = m_lock_info->m_file.get_unique_id(); | ||
|
||
REALM_ASSERT_RELEASE(s_info_map->find(m_fileuid) == s_info_map->end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I misunderstood what the problem was here.
I was wondering if it might be possible to come up with a deterministic test case using InterprocessMutex and the SpawnedProcess test helper. But unless you can come up with something creative, I think it is just inherently racy. I suppose our existing multiprocess tests are actually hitting this and causing the hang which is reassuring, so technically it is covered.
Co-authored-by: James Stone <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little thing with the test :-), otherwise looking good.
spawned.emplace_back( | ||
test_util::spawn_process(test_context.test_details.test_name, util::format("child [%1]", i))); | ||
|
||
if (spawned.back()->is_child()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it timing dependent what spawned.back() refers to when it executes in a child while the parent is still spawning other children?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't test_util::spawn_process essentially no-op in child process? It checks if REALM_CHILD_IDENT is present in env for a process and returns uninitialized SpawnedProcess. So first iteration in child process should get it, check the same thing again, execute and quit. That's the pattern in other test from what i can tell. May be i don't understand something here, but how parent process would influence this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just (mis)read the spawn_process as a fork() system call...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What, How & Why?
Opening same realm with multiple processes on fat32 and exfat file systems may hang. Every open initializes multiple management files for interprocess communication. Truncate on open on exfat leads to the new inode number being assigned to the file, hence mixing internal cache of shared lock data within process. Avoid doing so in InterprocessMutex.
Fixes #6739
☑️ ToDos
* [ ] C-API, if public C++ API changed.