-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix eager updateFromV1 #150
Conversation
99dc639
to
c993195
Compare
This should fix the error reported in canonical/microceph#367 as not-upgraded cluster members will continue to function with the old table. |
Why is this? |
The schema upgrade process works like in LXD where each member updates its row in the cluster members table with the size of the upgrade list, then compares this value with other rows. If all rows don't match for that column, the node waits for a notification over the internal API. The first cluster member to see all rows match for that column applies the updates. That last node compares the upgrade count from the cluster members table to the latest entry in the So we need to split the internal/external update count first outside of the main update mechanism. Right now we do this eagerly, so right when the first node starts. This changes the cluster members table which means old nodes querying that table will break. If we want to run that update after all nodes agree, we need two separate implementations for how to establish that agreement, because after the agreement is established and we run the update, the next time the nodes restart they won't be able to use the old mechanism anymore, since the schema will have changed. |
c993195
to
0c7efa1
Compare
@tomponline I've modified this PR with a slightly cleaner approach to the same problem. I've updated the PR description as well. I have #154 up as a draft with the old approach that uses a temporary table to compare. One caveat with this approach (which wasn't present in the temporary table version) is that there is a slight chance of incorrectly determining two cluster members expect the same schema version. |
@roosterfish If you've got any thoughts on this one versus #154, or if you've got a better idea for how to handle this problem, please let me know, thanks :) |
221ea8d
to
8b5634c
Compare
@tomponline @roosterfish I ended up sticking with a new table with insert/delete triggers over adding a view for two reasons:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why having a view won't work as downstreams might want to delete rows from the internal_cluster_members
table as part of removing some nodes from the cluster.
Forcibly applying the first schema update causes issues on existing nodes, particularly when restarting or during heartbeats. Instead, we can just infer what we expect the maximum version to look like, and wait until the last version is updated to actually apply anything. Signed-off-by: Max Asnaashari <[email protected]>
…ster members agree Signed-off-by: Max Asnaashari <[email protected]>
So that we don't break existing online cluster members, we can't just update the schema from underneath them. Instead, we can store schema updates in a temporary table `internal_cluster_members_new`, and reference this table during the schema update process, instead of the real `internal_cluster_members` table. Then, when we actually run the `updateFromV1` on the final cluster member to receive the update, replace the real `internal_cluster_members` table with the one that has been keeping track of the updates. Signed-off-by: Max Asnaashari <[email protected]>
…oesn't exist Signed-off-by: Max Asnaashari <[email protected]>
Signed-off-by: Max Asnaashari <[email protected]>
8b5634c
to
a7859e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
updateFromV1
was introduced by #94 because our schema update handling logic was fundamentally broken. Internal and external schema updates incremented the same row in theschemas
table, which means that when we receive a new update, we cannot tell which list (internal or external) got incremented. It's ambiguous which update we need to run.In order to keep the implementation cleaner,
updateFromV1
was made to run before cluster members agreed. It actually ran as soon as the first cluster member starts up and sees that the update exists. I overlooked that this would break querying the cluster members table, because updateFromV1 changes the columns of that table. This means existing cluster members who have not yet received the upgrade will break when querying the cluster members table, as a column they expect has been changed.This PR attempts to solve that problem by making
updateFromV1
behave like all other schema updates, where all cluster members have to agree first. However, this means we need to maintain two tables to check schema updates prior to runningupdateFromV1
.Here is how the process works:
updateFromV1
has ran by checking if atype
column exists on theschemas
table. Thistype
column makes a distinction between internal/external updates, and is introduced byupdateFromV1
.internal_cluster_members_new
if another cluster member has not already done so. It populates theinternal_cluster_members_new
table with all the same rows from theinternal_cluster_members
table, but fixes the schema columns. It also creates some insert/delete triggers so that the two tables remain in sync.internal_cluster_members_new
table with the schema version it expects based on the size of the update slices.updateFromV1
replaces the oldinternal_cluster_members
table withinternal_cluster_members_new
, and drops the triggers as well.What this means is that cluster members who have not received the update yet can continue to function using the old table. After the update has occurred, the old table is seamlessly replaced with the new table, which will be used for all schema updates going forward.