-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update server version on all entries in the sync history during client reset with recovery #7291
Conversation
…t reset with recovery
Pull Request Test Coverage Report for Build daniel.tabacaru_757
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that with these changes, the errors we saw in the server logs in Kotlin are now gone.
auto i = size_t(version - m_sync_history_base_version); | ||
util::compression::allocate_and_compress_nonportable(arena, changeset, compressed); | ||
m_arrays->changesets.set(i, BinaryData{compressed.data(), compressed.size()}); // Throws | ||
// Server version is updated for *every* entry in the sync history to ensure that server versions don't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing this as two separate loops is a bit clearer (and it's trivially faster, but that doesn't matter):
for (auto& [changeset, version] : recovered_changesets) {
uploadable_bytes += changeset.size();
auto i = size_t(version - m_sync_history_base_version);
util::compression::allocate_and_compress_nonportable(arena, changeset, compressed);
m_arrays->changesets.set(i, BinaryData{compressed.data(), compressed.size()}); // Throws
}
for (size_t i = 0, j = 0, size = m_arrays->remote_versions.size(); i < size; ++i) {
m_arrays->remote_versions.set(i, server_version.version);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially implemented it like that, then changed my mind. I will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more testing, I'm not sure if this fully fixes the issue.
We are now seeing another test (also asymmetric) that now fails with this:
Error: client file not found. The server has forgotten about the client-side file presented by the client. This is likely due to using a synchronized realm after terminating and re-enabling sync. Please wipe the file on the client to resume synchronization. { sessionIdent: 2, clientFileIdent: 35 } (ProtocolErrorCode=208)
But I do not see how we can trigger a Client Reset as this is the first time the Client is connecting to the server
The local log is attached: log-crash.txt
In the server logs there is a missing file ident as well, but not sure how that is possible:
@cmelchior the client reset seems unrelated. It's odd that the synchronization starts (the server even sends download messages), and then sends that error. maybe the server folks should have a look if you have the logs. |
FWIW, i think there's a race in your test or the server. You get a client reset error on the main realm and then the sync client tries to start a new realm to do the client reset with
Then that second realm that's going to do the client reset gets disconnected a few times.
And then you get a client reset error for the fresh realm being downloaded and that causes the whole test to fail...
Are you sure the test is waiting for sync to be terminated/restarted before trying to client reset? |
@jbreams Interesting. This is exactly what @cmelchior is seeing. |
Yeah, those log lines were pulled from @cmelchior's log file he attached in this ticket. |
I reran the test and pulled both server and local logs. Now I saw a new error: Logs attached EDIT: I deleted my local baas install and rebuilt it from scratch, then the test succeded. Will try to experiment with the test setup |
Same test, from a fresh BAAS but with platform networkin enabled. This one just ends up hanging with no error |
I don't see any requests to the server after the request to revoke the session which succeeds, so this all looks fine from the server's perspective. |
@kmorkos Can you have a look at the other logs where |
Those are transient. It just means that the test was concurrently making dev mode additive schema changes from a device while also doing other stuff with the app via the API. We could make that retry internally, but those are not caused by the changes here so should not need to block this change IMO. |
Hmm, I only saw the error once so maybe that makes sense. I do not want to hold up this PR for too long, but something has definitely broken the Kotlin test suite after we updated to Core 13.25.0 and it is surfacing as upload listeners not being triggered which ends up timing out our tests. But I should probably just create a new issue for that. |
@cmelchior , which test is failing specifically? Can you link it here? |
We are seeing quite a lot of different ones, but the ones I have been focusing on is around asymmetric sync. Specifically these to: https://github.com/realm/realm-kotlin/blob/ec7189244d7d1c4f7c5544fdfc0a00a4cf9ceef4/packages/test-sync/src/commonTest/kotlin/io/realm/kotlin/test/mongodb/common/AsymmetricSyncTests.kt#L133 and https://github.com/realm/realm-kotlin/blob/ec7189244d7d1c4f7c5544fdfc0a00a4cf9ceef4/packages/test-sync/src/commonTest/kotlin/io/realm/kotlin/test/mongodb/common/AsymmetricSyncTests.kt#L307 There I managed to reliably reproduce a error by starting a fresh BAAS, then running the single test with Platform networking enabled. Without platform networking these tests seem to work, but then we have others failing. So I'm not sure it is related to this. Our tests will detect if the test-app they need is present and if not, use the Admin API to create it. I suspect that this is somehow causing issues as adding artificial delays to our tests after creating the app seems to make some of the errors go away. The weird thing is that all of this worked prior to Core 13.25.0, so not sure what changed there. |
I created #7297 where I tried to describe it a bit more. I'll do some more testing and add any new logs to that issue instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes and the test look good
What, How & Why?
When performing recovery after a client reset, the server version of the recovered changesets is updated with the latest server version. The changesets recovered are only the changesets with actual changes, but empty changesets may have been created when a new subscriptions set is committed for example. The server version of such changesets was not updated, leading to
Bad server version
errors during uploads.The server version is now updated on all entries in the sync history.
Fixes #7279.
☑️ ToDos
[ ] C-API, if public C++ API changed[ ]bindgen/spec.yml
, if public C++ API changed