Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error happening when online compaction kicks in on async commits #6962

Merged
merged 7 commits into from
Sep 13, 2023

Conversation

jedelbo
Copy link
Contributor

@jedelbo jedelbo commented Sep 8, 2023

What, How & Why?

The issue here is that online compaction kicks in at the same time as async commits is used from 2 different threads. When the changes are going to be committed to disk, a GroupWriter objects is created. It discovers that online compaction is in progress and creates some additional info in the top level array. It causes a COW on the to array which is why it is being reported released in the SlabAllocator. Those changes are never written to disk, as the commit has already been created and just waiting to be written. When another transaction starts a new write transaction and makes some modifications, the same top level array will be COW again, which will trigger the assertion.

        Transaction 1                         Transaction 2
start write tr
    low_level_commit (no write)
        reset free space tracker

    commit to disk
        COW top array
end write
                                          start write
                                              COW top array
                                              BAAAMMM!!!

Fixes #6340
Fixes #6399
Fixes #6536
Fixes #6593
Fixes #6922

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

@cla-bot cla-bot bot added the cla: yes label Sep 8, 2023
@jedelbo jedelbo force-pushed the je/fix-issue-of-the-year branch 2 times, most recently from 1a4bcd1 to 9a25b01 Compare September 8, 2023 14:11
@jedelbo jedelbo force-pushed the je/fix-issue-of-the-year branch from 9a25b01 to 1ffa87c Compare September 8, 2023 14:16

#if REALM_HAVE_UV
namespace realm::util {
class UvLoopScheduler final : public util::Scheduler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do rewrite this test before merging to be scheduler agnostic so that we could verify our abstractions for Scheduler and EventLoop as they are supposed to be used on all platforms with sdks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that can be done. Our UV scheduler only supports one thread (As I understand it). But perhaps we don't need different threads. I will investigate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now rewritten the test case. It was not required to create a second thread. Just a second realm.

test/object-store/realm.cpp Outdated Show resolved Hide resolved
@nielsenko
Copy link
Contributor

Fixes realm/realm-dart#1396

src/realm/db.cpp Outdated
@@ -909,7 +909,7 @@ void DB::open(const std::string& path, bool no_create_file, const DBOptions& opt
REALM_ASSERT(path.size());

m_db_path = path;
m_path_hash = StringData(path).hash() & 0xffff;
m_path_hash = (size_t(this) >> 4) & 0xffff;
Copy link
Member

@nicola-cab nicola-cab Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line doing? It is not clear...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just hashing the file name was not good enough. If two DBs were opening the same file, they would get the same hash. Here we just take some bytes out of the object pointer. That should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make that into a comment :-)

Copy link
Member

@nicola-cab nicola-cab Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe I am wrong, but it seems that this is basically an important part for fixing this issue. And it is really really difficult to make sense of this line just reading it, without any added context.
Can you please add a comment or move this logic for hashing the db object into some helper method or some class.
Also, m_path_hash is no longer a good name for this, since you are basically hashing the ptr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_path_hash is only used for the prefix of the logger. It was changed for debugging purposes so that the log output can be differentiated between DB instances. In that way it is not critical to the fix, but it helped find out what was happening. The essence of the fix is separation of GroupWriter and GroupCommitter, so that async commits can just make a commit without doing any of the online compaction.

src/realm/db.cpp Outdated Show resolved Hide resolved
};
} // namespace realm::util

TEST_CASE("Concurrent commits") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about what this test do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@finnschiermer finnschiermer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rock'n roll, Baby!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
7 participants