-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCORE-1872 Sync client should allow server bootstrapping at any time #7440
Conversation
7ec0cc0
to
ae2574e
Compare
…o mwb/role-change-bootstraps
…o mwb/role-change-bootstraps
test/object-store/sync/flx_sync.cpp
Outdated
TEST_CASE("flx: role change bootstrap", "[sync][flx][baas][role_change][bootstrap]") { | ||
const Schema person_schema{{"Person", | ||
{{"_id", PropertyType::ObjectId, Property::IsPrimary{true}}, | ||
{"age", PropertyType::Int | PropertyType::Nullable}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only field in this schema that actually matter are "role". Do we need to make up fake age
and firstName
/lastName
properties? how about just a role
and a name
property so each object is {_id: ObjectId(), role: "manager", name: "manager-1" }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to remove the extra fields, but actually, I need the extra data so I can ensure adding the employees back to the local data generates multiple bootstrap messages.
test/object-store/sync/flx_sync.cpp
Outdated
auto logger = util::Logger::get_default_logger(); | ||
FLXSyncTestHarness harness("flx_role_change_bootstrap", {person_schema, {"role", "firstName", "lastName"}}); | ||
auto& app_session = harness.session().app_session(); | ||
// Enable the role change bootstraps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They currently don't work with baasaas and generate an error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because this depends on https://jira.mongodb.org/browse/BAAS-28604, or something else?
If so, can you reference that in here like TODO (BAAS-28604): ...
and be sure to link that ticket to RCORE-1872?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this code depends on using the feature flag instead of the protocol version for now. Once the feature is released, the client and server will coordinate the version bump when it will be merged from the feature branch to master. I added a comment to reflect this.
test/object-store/sync/flx_sync.cpp
Outdated
multi_msg = true; | ||
} | ||
// Verify a bootstrap occurred if multiple messages were received | ||
REQUIRE((!multi_msg || machina.get() == TestState::bs_complete)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to wait for the state machine to reach bs_complete before we try to get this state value? i'm also not sure what multi_msg is supposed to mean/do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we trying to assert here whether there should be multiple messages or just checking that if there happen to be multiple messages that they end in a complete bootstrap? Can some bootstraps in this test only have one message - how do we enforce that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using the wait_for_download()
to wait for the bootstrap to complete, since the server is not supposed to send the MARK
response message until after the role change server initiated bootstrap.
Since the integration of the bootstrap is performed when the final bootstrap message is received, the MARK response is not processed until after the bootstrap has been integrated, so the state machine will be updated to the expected state before the wait_for_download()
returns.
src/realm/sync/client.cpp
Outdated
|
||
// If this is the first LastInBatch=true message after one or more | ||
// LastInBatch=false messages, this is the end of a bootstrap | ||
if (m_last_download_batch_state == DownloadBatchState::MoreToCome) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wish we didn't have to add this state tracking, but I think I see why we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - it's so we can catch the LastInBatch=false => LastInBatch=true sequence for the server initiated bootstraps. Can you think of a better way?
…o mwb/role-change-bootstraps
test/object-store/sync/flx_sync.cpp
Outdated
FLXSyncTestHarness harness("flx_role_change_bootstrap", {person_schema, {"role", "firstName", "lastName"}}); | ||
auto& app_session = harness.session().app_session(); | ||
// Enable the role change bootstraps | ||
// REQUIRE(app_session.admin_api.set_feature_flag(app_session.server_app_id, "allow_permissions_bootstrap", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we going to need this api in the tests? if not, i'm wondering if we should keep the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? - right now, the feature is enabled for v14 on the branch I am using, but it may go back to the feature flag once that branch is merged into master.
These will be no longer needed, and therefore removed, once the feature branch is ready to merge into master.
@@ -4927,6 +4927,255 @@ TEST_CASE("flx: nested collections in mixed", "[sync][flx][baas]") { | |||
CHECK(nested_list.get_any(1) == "foo"); | |||
} | |||
|
|||
TEST_CASE("flx: role change bootstrap", "[sync][flx][baas][role_change][bootstrap]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just my 2 cents, but I think the test(s) can be simplified a great deal by doing the following:
- instead of using
register_connection_change_callback
you can use the debug hook and check the client receives a 200 error - for query version 1 you can count all non-empty download messages (the role change bootstrap essentially) and assert on that (additionally, once all messages are received (last_in_batch=true for BootstrapMessageProcessed) you can check the PendingBootstrapStore stored the right data)
- finally, you assert the expected data is in the realm
- if you want multiple bootstrap in a single test, you can optionally pause the session, change the roles, and resume the session to test role changes detection at ident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points - thanks for the recommendations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't mention the state machine, but I'm not sure you need it, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state machine is being used for two reasons:
- Wait for the 200 error to be received
- Verify the order of receiving a server initiated bootstrap with multiple messages, so we track the order/steps of MoreToCome, LastInBatch and then bootstrap processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't wait_for_download enough for all of this? and you can do the tracking in on_sync_client_event_hook
…o mwb/role-change-bootstraps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a hopefully useful comment about using test command futures.
test/object-store/sync/flx_sync.cpp
Outdated
REQUIRE(cur_state == TestState::reconnect_received); | ||
if (auto session = weak_session.lock()) { | ||
logger->trace("ROLE CHANGE: sending PAUSE test command after resumed"); | ||
test_command_futures.push_back(pause_download_builder(*session, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the test_command_futures vector and associated machinery is just to verify that you've reached the session_resumed state after the test command has been sent. So it might be simpler to let the future do that for you with something like this.
if (send_test_command) {
REQUIRE(cur_state == TestState::reconnect_received);
auto session = weak_session.lock();
REQUIRE(session);
pause_download_builder.get_async([&](StatusWith<std::string> payload) {
REQUIRE(payload.is_ok());
machina.transition_with([&](TestState state) {
return TestState::session_resumed;
});
});
return std::nullopt;
} else {
return TestState::session_resumed;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That simplifies things definitely. And the only thing the future callback needs to check for is success; it doesn't need to do anything with the state machine.
…o mwb/role-change-bootstraps
…o mwb/role-change-bootstraps
@@ -1470,6 +1467,10 @@ inline void ClientImpl::Session::connection_established(bool fast_reconnect) | |||
++m_target_download_mark; | |||
} | |||
|
|||
// Call SessionResumed before sending the BIND Message to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this comment and why it has to be here (same below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is primarily here so it doesn't get moved. The role change test needs to be notified of the session resumed/connected prior to sending the BIND message so it can queue up the test command to be sent before the IDENT message. If this notification happens after the BIND message, there isn't enough time for the test to queue up the test command before the IDENT message is sent.
I updated the message to hopefully be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I find it a bit hack-ish though. IIUC, there is a scheduling problem because send_test_command
posts to the event loop. What if instead we update m_pending_test_commands
directly under a lock? And then you can invoke send_test_command from the event loop when sending BIND (you actually don't need the lock if you create a new method only to be invoked from the event loop). Would that work for your tests? @jbreams what do you think of this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but there isn't a good place to do this, unless I put plumbing in the sync session to add directly to the list of test commands. This is the only thing available in the event hook callback functions.
The current approach ensures the callback to add the test command gets post
ed before the BIND message is sent and the callback for async_write_binary()
is run on the event loop to send the next message after the BIND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, i think this is fine for now. another approach could be to send the test commands when you get the 200 disconnect since i think we preserve the list of pending test commands across disconnects, but i wouldn't want to hold this project up over it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the list of test commands sticks around for the lifetime of the ClientImpl::Session
and the only time commands are removed are when a response is received or the session object is destroyed.
The current send_test_command()
logic checks to make sure the session is currently active, but that is easy enough to update if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the session only becomes inactive if you pause()
it or destroy the realm - i think in the tests we have right now it should survive a disconnect/reconnect.
@@ -60,6 +60,9 @@ namespace sync { | |||
// 13 Support for syncing collections (lists and dictionaries) in Mixed columns and | |||
// collections of Mixed | |||
// | |||
// 14 Support for server initiated bootstraps, including bootstraps for role/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to update get_current_protocol_version()
to return the new version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was waiting until this feature was about to be merged, but it really doesn't matter, since I'm using a feature branch and the feature is behind a feature flag.
…o mwb/role-change-bootstraps
test/object-store/sync/flx_sync.cpp
Outdated
|
||
auto setup_harness = [&](FLXSyncTestHarness& harness, TestParams params) { | ||
auto& app_session = harness.session().app_session(); | ||
/** TODO: Remove when switching to use Protocol version in RCORE-1972 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're doing it as part of this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the server hasn't been updated to use the protocol version yet... I believe it is still behind the feature flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it is just controlled by a feature flag
test/object-store/sync/flx_sync.cpp
Outdated
|
||
// Add client reset callback to verify a client reset doesn't happen | ||
config.sync_config->notify_before_client_reset = [&](std::shared_ptr<Realm>) { | ||
did_client_reset = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FAIL()
(and no need for did_client_reset)
…o mwb/role-change-bootstraps
…o mwb/role-change-bootstraps
…o mwb/role-change-bootstraps
What, How & Why?
Add the ability for the server to send a bootstrap to the server at any time, initially needed to support the handle role changes without a client reset. With these changes a single message/single changeset bootstrap for the current query version will be applied like a regular download message, however a multi-message or single message/multi changeset bootstrap for the current query version will be applied like a regular bootstrap.
These changes also include a test to validate the role change bootstrap and verify the local data is updated to reflect the new permissions. Additional tests are forthcoming in a couple of future PRs.
The version of baasaas and baas used with these changes are based off a branch that is pending review and the protocol version was bumped to v14 in order to support these changes without the feature flag.Fixes #7584, #7326
☑️ ToDos
[ ] C-API, if public C++ API changed[ ]bindgen/spec.yml
, if public C++ API changed