-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsr: sync uses correct view to go into recovering_head #1705
Conversation
a2ffbb5
to
164e8a5
Compare
last commit fixes #1702 -- a semi related vopr seed that also trips on main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update sync_view
during superblock.checkpoint()
? Then we would have the invariant that sync_view >= checkpoint.header.view
.
assert(message.header.view <= self.view); | ||
// Pings advertise checkpoints, and current checkpoint's view might be greater than | ||
// the replica view. | ||
if (message.header.view > self.view) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be > self.view_durable()
? We might have just called transition_to_normal_from_recovering_head_status()
so our view_durable
is not yet updated, but our self.view
is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I believe that
if (message.header.view > self.view_durable()) {
assert(self.superblock.working.vsr_state.sync_view >
self.superblock.working.vsr_state.view);
}
would hold, but that's an almost tautological assertion.
In contrast
if (message.header.view > self.view_durable()) {
assert(self.status == .recovering_head);
}
would not hold -- transition_to_normal_from_recovering_head_status
changes status, but leaves self.superblock.working.vsr_state
intact until durable update finishes.
And
if (message.header.view > self.view) {
assert(self.status == .recovering_head);
Is the interesting case -- it ensures that we can't get out out .recovering_head
state earlier than self.superblock.working.vsr_state.view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I get it; thank you.
ba6e054
to
8c352c5
Compare
Excellent point! We should at least reset it to zero once the sync is done. But I think it's best not to update it, and to keep it scoped strictly to state sync, rather than mix in both sync and non-sync code paths in a single field. |
960f8c3
to
84bc115
Compare
@@ -958,6 +970,7 @@ pub fn SuperBlockType(comptime Storage: type) type { | |||
vsr_state.commit_max = update.commit_max; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A few lines above this) we can assert that update.sync_view >= update.checkpoint.header.view
.
@@ -863,6 +873,7 @@ pub fn SuperBlockType(comptime Storage: type) type { | |||
vsr_state.commit_max = update.commit_max; | |||
vsr_state.sync_op_min = update.sync_op_min; | |||
vsr_state.sync_op_max = update.sync_op_max; | |||
vsr_state.sync_view = update.sync_view; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could assert that update.sync_view == 0 or update.sync_view == superblock.staging.vsr_state.sync_view
.
It might be the case that op_checkpoint: * is prepared in view X * is truncated in view X+1 * is committed in view X+2 When deciding whether to go into recovering_head state, the replica currently considers prepared view (checkpoint.header.view), and not the committed view. Use the correct view by: * Adding checkpoint's view into VSR State (but _not_ checkpoint state: different replicas might commit prepares at different views) * Populating that view from the message that informed us about the checkpoint target * this requires some intricate logic on pings, to make sure they indeed propagate correct view for checkpoint --- a replica accepts a checkpoint before it transitions to its view, and it should subsequently correctly propagate this higher view. This works because checkpoint view is also durable. Seed: 8593423301425288917 Closes: #1703
84bc115
to
1323532
Compare
1323532
to
10665e1
Compare
It might be the case that op_checkpoint:
When deciding whether to go into recovering_head state, the replica
currently considers prepared view (checkpoint.header.view), and not the
committed view.
Use the correct view by:
Adding checkpoint's view into VSR State (but not checkpoint state:
different replicas might commit prepares at different views)
Populating that view from the message that informed us about the
checkpoint target
this requires some intricate logic on pings, to make sure they
indeed propagate correct view for checkpoint --- a replica accepts a
checkpoint before it transitions to its view, and it should
subsequently correctly propagate this higher view.
This works because checkpoint view is also durable.
Seed: 8593423301425288917
Closes: #1703