Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee that bootstraps get applied if they receive the final bootstrap message #5331

Merged
merged 12 commits into from
May 25, 2022

Conversation

jbreams
Copy link
Contributor

@jbreams jbreams commented Mar 16, 2022

What, How & Why?

When FLX sync is enabled we need to defer applying download messages that contain partial bootstrap changesets until we have the entire set of changesets available. This is because if the bootstrap is interrupted, when we retry the bootstrap later on, the server may be using a different snapshot with different results than the first bootstrap attempt. By using a different snapshot, the server may have a different set of objects that are in-view between bootstrap attempts, and if we were to apply partial bootstrap download messages, those objects that were in-view in the first attempt but out-of-view in the second attempt will be orphaned on the client.

This change introduces a new PendingBootstrapStore that will store the contents of bootstrap download messages in a side-table until a download message with last_in_batch=true has been received. If the bootstrap is interrupted before a last_in_batch=true message is received, none of the prior messages shall be applied and the bootstrap will restart. If the bootstrap is interrupted after the last_in_batch=true message is received, but before the bootstrap has been fully applied, the sync client shall apply all the remaining bootstrap messages before sending the IDENT message an resuming the session.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)

@cla-bot cla-bot bot added the cla: yes label Mar 16, 2022
VersionInfo version_info;
ClientReplication& repl = access_realm(); // Throws
if (m_wrapper.m_flx_pending_bootstrap_store) {
while (m_wrapper.m_flx_pending_bootstrap_store->has_pending()) {
Copy link
Collaborator

@danieltabacaru danieltabacaru Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we have the same problem if we don't get to apply all changeset batches here? I thought the idea was to apply all or nothing (i.e., concatenate all batches and have one single call to integrate_changesets)

@jbreams jbreams force-pushed the jbr/batch_up_bootstrap branch 3 times, most recently from 59d7b4c to 986e35d Compare May 6, 2022 17:35
@jbreams jbreams force-pushed the jbr/batch_up_bootstrap branch from a41ac48 to 765f8ac Compare May 11, 2022 18:46
@jbreams jbreams changed the title Jbr/batch up bootstrap Guarantee that bootstraps get applied if they receive the final bootstrap message May 11, 2022
@jbreams jbreams marked this pull request as ready for review May 11, 2022 19:12
@jbreams jbreams requested review from danieltabacaru and ironage May 11, 2022 19:12
@jbreams
Copy link
Contributor Author

jbreams commented May 18, 2022

@danieltabacaru and @ironage , do you think you'll have time to review this soon?

// state.
PendingBatch peek_pending(size_t limit_in_bytes);

// Removes the first set of changesets from the current pending bootstrap batch. The transaction must be int the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// state.
PendingBatch peek_pending(size_t limit_in_bytes);

// Removes the first set of changesets from the current pending bootstrap batch. The transaction must be int the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> in

// writing state.
void pop_front_pending(const TransactionRef& tr, size_t count);

// Adds a set of changesets to the store and returns the local version produced by the transaction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not return anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

// writing state.
void pop_front_pending(const TransactionRef& tr, size_t count);

// Adds a set of changesets to the store and returns the local version produced by the transaction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not return anything

// If there are any incomplete bootstraps in the table, we need to clear them and then update our has_pending
// flag accordingly.
if (auto bootstrap_table = tr->get_table(m_table); !bootstrap_table->is_empty()) {
m_has_pending = (bootstrap_table->is_empty() == false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the check above conclude that bootstrap_table is not empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. also removed the comment above since that gets handled elsewhere now.

// If there are any incomplete bootstraps in the table, we need to clear them and then update our has_pending
// flag accordingly.
if (auto bootstrap_table = tr->get_table(m_table); !bootstrap_table->is_empty()) {
m_has_pending = (bootstrap_table->is_empty() == false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the check above conclude that bootstrap_table is not empty?

ret.changeset_data.push_back(util::AppendBuffer<char>());
auto& uncompressed_buffer = ret.changeset_data.back();

auto raw_changeset_data = cur_changeset.get<BinaryData>(m_changeset_data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could name this compressed_changeset_data

ret.changeset_data.push_back(util::AppendBuffer<char>());
auto& uncompressed_buffer = ret.changeset_data.back();

auto raw_changeset_data = cur_changeset.get<BinaryData>(m_changeset_data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could name this compressed_changeset_data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bootstrap_obj.get<int64_t>(m_query_version), changeset_list.size());
}

m_has_pending = (bootstrap_table->is_empty() == false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not m_has_pending = !bootstrap_table->is_empty()?

bootstrap_obj.get<int64_t>(m_query_version), changeset_list.size());
}

m_has_pending = (bootstrap_table->is_empty() == false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not m_has_pending = !bootstrap_table->is_empty()?

std::function<void()> on_download_message_received_hook;
std::function<void(const sync::SyncProgress&, int64_t, sync::DownloadBatchState)>
on_download_message_received_hook;
/// Will be called after each download message is integrated. For use in testing.
Copy link
Collaborator

@danieltabacaru danieltabacaru May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a bit misleading since we do not guarantee this is called for each download messages received (i.e., we "construct" our own download messages through peek_pending)
Edit: This actually seems to be called when a download message is added to the store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied the comment in src/realm/sync/config.hpp for this parameter here since I think that one is actually accurate.

std::function<void()> on_download_message_received_hook;
std::function<void(const sync::SyncProgress&, int64_t, sync::DownloadBatchState)>
on_download_message_received_hook;
/// Will be called after each download message is integrated. For use in testing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a bit misleading since we do not guarantee this is called for each download messages received (i.e., we "construct" our own download messages through peek_pending)


if (batch_state == DownloadBatchState::LastInBatch) {
update_progress(progress); // Throws
if (process_flx_bootstrap_message(progress, batch_state, query_version, received_changesets)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be pass downloadable_bytes and save it in the metadata table too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloadable_bytes doesn't really have meaning in FLX sync and this should always be zero.


if (batch_state == DownloadBatchState::LastInBatch) {
update_progress(progress); // Throws
if (process_flx_bootstrap_message(progress, batch_state, query_version, received_changesets)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be pass downloadable_bytes and save it in the metadata table too?

std::function<void(std::weak_ptr<SyncSession>)> on_download_message_received_hook;
std::function<void(std::weak_ptr<SyncSession>, const sync::SyncProgress&, int64_t, sync::DownloadBatchState)>
on_download_message_received_hook;
// Will be called after each bootstrap message is added to the pending bootstrap store, but before_frozen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is a bit more accurate, but I'm not sure what before_frozen has to do with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my new auto-completer in neovim is auto-completing some nonsense.

std::function<void(std::weak_ptr<SyncSession>)> on_download_message_received_hook;
std::function<void(std::weak_ptr<SyncSession>, const sync::SyncProgress&, int64_t, sync::DownloadBatchState)>
on_download_message_received_hook;
// Will be called after each bootstrap message is added to the pending bootstrap store, but before_frozen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is a bit more accurate, but I'm not sure what before_frozen has to do with it.

bool process_flx_bootstrap_message(const SyncProgress& progress, DownloadBatchState batch_state,
int64_t query_version, const ReceivedChangesets& received_changesets);

// Processes any pending FLX bootstraps, if one excists. Otherwise this is a noop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

bool process_flx_bootstrap_message(const SyncProgress& progress, DownloadBatchState batch_state,
int64_t query_version, const ReceivedChangesets& received_changesets);

// Processes any pending FLX bootstraps, if one excists. Otherwise this is a noop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exists

@@ -34,6 +34,8 @@ using TransactionRef = std::shared_ptr<Transaction>;
namespace realm::sync {
namespace internal_schema_groups {
constexpr static std::string_view c_flx_subscription_store("flx_subscription_store");
constexpr static std::string_view c_pending_bootstraps("PendingBootstraps");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -34,6 +34,8 @@ using TransactionRef = std::shared_ptr<Transaction>;
namespace realm::sync {
namespace internal_schema_groups {
constexpr static std::string_view c_flx_subscription_store("flx_subscription_store");
constexpr static std::string_view c_pending_bootstraps("PendingBootstraps");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@@ -19,6 +19,7 @@
#ifndef REALM_OS_SYNC_SESSION_HPP
#define REALM_OS_SYNC_SESSION_HPP

#include "realm/sync/client_base.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. I think this got accidentally added by clangd


auto tr = m_db->start_write();
auto bootstrap_table = tr->get_table(m_table);
auto incomplete_bootstraps = Query(bootstrap_table).less(m_query_version, query_version).find_all();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one bootstrap can be active at a time right? Can we change less to not_equal? I haven't implemented this yet, but I think the query version may decrease in a client reset with recovery. Or actually I may need to just clear out the entire PendingBootstrap store upon reset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmmm... I think we need to guarantee that the query version doesn't decrease? or at least we may need to coordinate that with the server. I think the server currently assumes that the query version will never decrease over the life of a file ident.

but yes, we can change this to not_equal.

}

void PendingBootstrapStore::pop_front_pending(const TransactionRef& tr, size_t count)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert that tr is a write transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -34,6 +34,8 @@ using TransactionRef = std::shared_ptr<Transaction>;
namespace realm::sync {
namespace internal_schema_groups {
constexpr static std::string_view c_flx_subscription_store("flx_subscription_store");
constexpr static std::string_view c_pending_bootstraps("PendingBootstraps");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we doing snake_case or CamelCaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jbreams jbreams force-pushed the jbr/batch_up_bootstrap branch from 0d2740b to 1266fc3 Compare May 24, 2022 19:52
@jbreams jbreams merged commit 5f7fac2 into master May 25, 2022
@jbreams jbreams deleted the jbr/batch_up_bootstrap branch May 25, 2022 03:40
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants