Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify internal commit notification #7031

Merged
merged 4 commits into from
Oct 11, 2023
Merged

Simplify internal commit notification #7031

merged 4 commits into from
Oct 11, 2023

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented Oct 4, 2023

The concrete bug which this fixes is that await realm.subscriptions.update {}; await realm.refresh() would hang forever. SubscriptionStore writes failed to notify RealmCoordinator of the write, so the async refresh would see that the Realm is not on the latest version, register a handler to be called when the autorefresh happened, and then nothing would ever schedule the autorefresh.

The sync client needs to be notified of non-sync writes and notify non-sync components when it performs writes. When it was first written, DB did not exist yet and so this was orchestrated via RealmCoordinator. However, that's a very awkward place to do it: not all writes go via RealmCoordinator, and the lifetime of sync sessions isn't actually tied to a coordinator. Nowadays we do have DB, and handling commit notifications there greatly simplifies everything.

There was also a second mechanism for notifying the sync client of writes which modified the subscription store. This appears to have been mostly redundant and unnecessary. The only additional information it conveyed was a number only used in some assertions.

Sync progress notifications somewhat relied on that some of the internal writes by the sync client didn't trigger them, and this change made it so that some very useless notifications were sent. To fix this, I made it so that commits will only trigger notifications if they changed the uploadable bytes, i.e. empty changesets don't produce notifications.

Ideally ExternalCommitHelper would live on DB rather than RealmCoordinator and nonsync_transact_notify() could go away entirely, but that looks like it'd be a pretty complicated change.

@tgoyne tgoyne self-assigned this Oct 4, 2023
@coveralls-official
Copy link

coveralls-official bot commented Oct 5, 2023

Pull Request Test Coverage Report for Build github_pull_request_279324

  • 425 of 432 (98.38%) changed or added relevant lines in 20 files are covered.
  • 59 unchanged lines in 14 files lost coverage.
  • Overall coverage increased (+0.02%) to 91.586%

Changes Missing Coverage Covered Lines Changed/Added Lines %
test/object-store/sync/flx_sync.cpp 7 8 87.5%
test/test_transform.cpp 19 20 95.0%
test/test_sync.cpp 230 235 97.87%
Files with Coverage Reduction New Missed Lines %
test/object-store/sync/flx_sync.cpp 1 98.36%
test/test_index_string.cpp 1 94.13%
src/realm/sync/network/http.hpp 2 80.87%
src/realm/table_view.cpp 2 94.18%
test/fuzz_group.cpp 2 54.4%
src/realm/sort_descriptor.cpp 3 93.7%
src/realm/util/future.hpp 3 95.81%
test/test_thread.cpp 3 66.67%
src/realm/util/file.cpp 4 81.25%
src/realm/sync/network/websocket.cpp 5 74.74%
Totals Coverage Status
Change from base Build 1745: 0.02%
Covered Lines: 230468
Relevant Lines: 251641

💛 - Coveralls

@tgoyne tgoyne force-pushed the tg/commit-notify branch 2 times, most recently from 747823f to 24f055f Compare October 6, 2023 23:01
Comment on lines -655 to +652
// Ensure the notifiers aren't holding on to Transactions after we destroy
// the History object the DB depends on
// If there's any active NotificationTokens they'll keep the notifiers alive,
// so tell the notifiers to release their Transactions so that the DB can
// be closed immediately.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was stale and the reason why we originally needed release_data() no longer applies, but it is still required for other reasons.

Comment on lines -1109 to -1111
// In general, `m_upload_target_version` follows `m_last_version_available`
// as it is increased, but in some cases, `m_upload_target_version` will be
// kept fixed for a while in order to constrain the uploading process.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was once true, but now the only time m_upload_target_version wasn't the same as m_last_version_available was at times where we couldn't be uploading changesets anyway (such as while in the process of applying a client reset recovery on the sync worker thread).

Comment on lines -2007 to -2010
if (!m_pending_flx_sub_set || m_pending_flx_sub_set->snapshot_version < m_upload_progress.client_version) {
m_pending_flx_sub_set = get_flx_subscription_store()->get_next_pending_version(
m_last_sent_flx_query_version, m_upload_progress.client_version);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was dead: the single caller of send_upload_message() (send_message()) ensures m_pending_flx_sub_set up to date as part of deciding if it should call send_upload_message() in the first place, so it never needs to be refreshed here.

@@ -231,25 +219,24 @@ TEST(ClientReset_InitialLocalChanges)
ClientServerFixture fixture(dir, test_context);
fixture.start();

Session session_1 = fixture.make_session(path_1, server_path);
DBRef db_1 = DB::create(make_client_replication(), path_1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test predates core 6 and was doing something which is now quite weird (writing to a Realm via a second DB not linked to the sync session and not via a RealmCoordinator). It would have continued to work unchanged by continuing to use nonsync_transact_notify(), but since it isn't actually trying to test multiprocess things I made it normal instead.

Comment on lines -822 to -829
// NOTE: There was a race condition with `write_transaction_notifying_session` where session_2
// was completing sync before the write transaction was completed, leading to a
// `realm::TableNameInUse` exception. Broke up this function and moved the call to
// `nonsync_transact_notify()` to after the write transactions.
auto version_1 = perform_write_transaction(db_1, std::move(fn_1));
auto version_2 = perform_write_transaction(db_2, std::move(fn_2));
session_1.nonsync_transact_notify(version_1);
session_2.nonsync_transact_notify(version_2);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This race goes away by just moving the writes to before binding.

else {
CHECK_GREATER(progress_version, 0);
CHECK_GREATER(snapshot_version, 3);
switch (entry_1) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This more precise test also passes on master (with some nonsync_transact_notify()s added).

@@ -3783,6 +3771,76 @@ TEST(Sync_UploadDownloadProgress_7)
// down the session that is in the process of being created.
}

TEST(Sync_UploadProgress_EmptyCommits)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't pass on master, but I think the new behavior is sensible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no issue with the test

@tgoyne tgoyne force-pushed the tg/commit-notify branch 2 times, most recently from 30c2109 to 7fb0ce1 Compare October 9, 2023 18:25
@tgoyne tgoyne marked this pull request as ready for review October 9, 2023 21:52
if (m_pending_flx_sub_set && m_pending_flx_sub_set->snapshot_version < m_upload_target_version) {
target_upload_version = m_pending_flx_sub_set->snapshot_version;
}
version_type target_upload_version = get_db()->get_version_of_latest_snapshot();
Copy link
Collaborator

@danieltabacaru danieltabacaru Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the target be m_last_version_available? Is this because that's actually not the case since subscriptions don't report their snapshot version anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT there's no reason to limit it to m_last_version_available. If a commit happens on another thread while we're enqueued to send, it's fine to upload that changeset while the notification is still waiting in the event loop's queue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Collaborator

@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@michael-wb michael-wb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Nice, this simplifies some of the coordination around realm updates.

test/test_client_reset.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any simplification to the notification system is a win from my perspective 👍

src/realm/object-store/impl/realm_coordinator.hpp Outdated Show resolved Hide resolved
The documentation suggests there was once a mechanism for uploading up to a
specific version and then stopping, but this is now only used for sending QUERY
messages at the correct time, and that can be done more directly. This cuts
down on the amount of state that needs to be tracked and sometimes (very
insignificantly) improves upload latency.
The concrete bug which this fixes is that `await realm.subscriptions.update {};
await realm.refresh()` would hang forever. SubscriptionStore writes failed to
notify RealmCoordinator of the write, so the async refresh would see that the
Realm is not on the latest version, register a handler to be called when the
autorefresh happened, and then nothing would ever schedule the autorefresh.

The sync client needs to be notified of non-sync writes and notify non-sync
components when it performs writes. When it was first written, DB did not exist
yet and so this was orchestrated via RealmCoordinator. However, that's a very
awkward place to do it: not all writes go via RealmCoordinator, and the
lifetime of sync sessions isn't actually tied to a coordinator. Nowadays we do
have DB, and handling commit notifications there greatly simplifies everything.

There was also a *second* mechanism for notifying the sync client of writes
which modified the subscription store. This appears to have been entirely
redundant and unnecessary.
@tgoyne tgoyne merged commit 8f4f990 into master Oct 11, 2023
@tgoyne tgoyne deleted the tg/commit-notify branch October 11, 2023 20:13
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants