Fix eager updateFromV1 #150

masnax · 2024-06-14T22:20:13Z

updateFromV1 was introduced by #94 because our schema update handling logic was fundamentally broken. Internal and external schema updates incremented the same row in the schemas table, which means that when we receive a new update, we cannot tell which list (internal or external) got incremented. It's ambiguous which update we need to run.

In order to keep the implementation cleaner, updateFromV1 was made to run before cluster members agreed. It actually ran as soon as the first cluster member starts up and sees that the update exists. I overlooked that this would break querying the cluster members table, because updateFromV1 changes the columns of that table. This means existing cluster members who have not yet received the upgrade will break when querying the cluster members table, as a column they expect has been changed.

This PR attempts to solve that problem by making updateFromV1 behave like all other schema updates, where all cluster members have to agree first. However, this means we need to maintain two tables to check schema updates prior to running updateFromV1.

Here is how the process works:

When a cluster member starts, it checks if updateFromV1 has ran by checking if a type column exists on the schemas table. This type column makes a distinction between internal/external updates, and is introduced by updateFromV1.
If not, then that cluster member attempts to create internal_cluster_members_new if another cluster member has not already done so. It populates the internal_cluster_members_new table with all the same rows from the internal_cluster_members table, but fixes the schema columns. It also creates some insert/delete triggers so that the two tables remain in sync.
Each cluster member updates its row in the internal_cluster_members_new table with the schema version it expects based on the size of the update slices.
Once all cluster members have settled on a schema update (all schema columns are equal for every row), we proceed to run schema updates.
updateFromV1 replaces the old internal_cluster_members table with internal_cluster_members_new, and drops the triggers as well.

What this means is that cluster members who have not received the update yet can continue to function using the old table. After the update has occurred, the old table is seamlessly replaced with the new table, which will be used for all schema updates going forward.

masnax · 2024-06-14T22:56:50Z

This should fix the error reported in canonical/microceph#367 as not-upgraded cluster members will continue to function with the old table.

tomponline · 2024-06-16T18:32:57Z

However, since this update runs before we have ensured every cluster member is ready for the update, this means that micro* cluster list will be broken on all running nodes when 1 node is refreshed with the update. This is a particular issue for MicroOVN whose focus for this cycle has been hassle-free updates.

Why is this?

masnax · 2024-06-16T20:22:50Z

However, since this update runs before we have ensured every cluster member is ready for the update, this means that micro* cluster list will be broken on all running nodes when 1 node is refreshed with the update. This is a particular issue for MicroOVN whose focus for this cycle has been hassle-free updates.

Why is this?

The schema upgrade process works like in LXD where each member updates its row in the cluster members table with the size of the upgrade list, then compares this value with other rows. If all rows don't match for that column, the node waits for a notification over the internal API. The first cluster member to see all rows match for that column applies the updates.

That last node compares the upgrade count from the cluster members table to the latest entry in the schemas table to determine which updates to run. Since the value in the old tables is an aggregate of internal/external updates, we can't accurately determine which updates we have already run. This is the example I gave with 1+3=4 vs 2+2=4.

So we need to split the internal/external update count first outside of the main update mechanism. Right now we do this eagerly, so right when the first node starts. This changes the cluster members table which means old nodes querying that table will break.

If we want to run that update after all nodes agree, we need two separate implementations for how to establish that agreement, because after the agreement is established and we run the update, the next time the nodes restart they won't be able to use the old mechanism anymore, since the schema will have changed.

masnax · 2024-06-17T17:51:13Z

@tomponline I've modified this PR with a slightly cleaner approach to the same problem. I've updated the PR description as well. I have #154 up as a draft with the old approach that uses a temporary table to compare.

One caveat with this approach (which wasn't present in the temporary table version) is that there is a slight chance of incorrectly determining two cluster members expect the same schema version.

masnax · 2024-06-17T18:00:12Z

@roosterfish If you've got any thoughts on this one versus #154, or if you've got a better idea for how to handle this problem, please let me know, thanks :)

masnax · 2024-06-18T16:44:57Z

@tomponline @roosterfish I ended up sticking with a new table with insert/delete triggers over adding a view for two reasons:

If a cluster member is refreshed accidentally to the new version, there is no way to undo the view creation which means the cluster will be forced to update or become read-only. With a temporary table, we can still revert the refresh until all cluster members agree. The only side effect would be a leftover table that is not used by the old implementation anyway.
A view is not writeable, which means if a cluster member is in an unhealthy state during an update, the cluster will never agree on the update. We would need to add support for something like LXD's patch.global.sql to work around this. Using a temporary table means other cluster members can still remove broken nodes, and with the triggers, these changes will be reflected in the temporary table.

roosterfish

I understand why having a view won't work as downstreams might want to delete rows from the internal_cluster_members table as part of removing some nodes from the cluster.

internal/db/update/schema.go

cluster/cluster_members.go

internal/db/update/update.go

cluster/cluster_members.go

Forcibly applying the first schema update causes issues on existing nodes, particularly when restarting or during heartbeats. Instead, we can just infer what we expect the maximum version to look like, and wait until the last version is updated to actually apply anything. Signed-off-by: Max Asnaashari <[email protected]>

…ster members agree Signed-off-by: Max Asnaashari <[email protected]>

So that we don't break existing online cluster members, we can't just update the schema from underneath them. Instead, we can store schema updates in a temporary table `internal_cluster_members_new`, and reference this table during the schema update process, instead of the real `internal_cluster_members` table. Then, when we actually run the `updateFromV1` on the final cluster member to receive the update, replace the real `internal_cluster_members` table with the one that has been keeping track of the updates. Signed-off-by: Max Asnaashari <[email protected]>

…oesn't exist Signed-off-by: Max Asnaashari <[email protected]>

Signed-off-by: Max Asnaashari <[email protected]>

roosterfish

LGTM!

masnax requested review from tomponline, roosterfish and markylaing as code owners June 14, 2024 22:20

masnax marked this pull request as draft June 14, 2024 22:22

masnax force-pushed the fix-eager-schema-update branch from 99dc639 to c993195 Compare June 14, 2024 22:50

masnax marked this pull request as ready for review June 14, 2024 22:55

masnax mentioned this pull request Jun 15, 2024

Add descriptive errors for database status checks #148

Merged

masnax force-pushed the fix-eager-schema-update branch from c993195 to 0c7efa1 Compare June 17, 2024 16:59

masnax mentioned this pull request Jun 17, 2024

Fix eager schema update (temporary table version) #154

Closed

masnax force-pushed the fix-eager-schema-update branch from 221ea8d to 8b5634c Compare June 18, 2024 16:33

roosterfish requested changes Jun 19, 2024

View reviewed changes

masnax added 5 commits June 19, 2024 15:29

internal/db/update: Manually fix the schemas table only after all clu…

e53645a

…ster members agree Signed-off-by: Max Asnaashari <[email protected]>

internal/db/update: Only insert rows into the temporary table if it d…

a3f3ddb

…oesn't exist Signed-off-by: Max Asnaashari <[email protected]>

internal/db/update: Add triggers for cluster adding/removing

a7859e3

Signed-off-by: Max Asnaashari <[email protected]>

masnax force-pushed the fix-eager-schema-update branch from 8b5634c to a7859e3 Compare June 19, 2024 15:29

masnax requested a review from roosterfish June 19, 2024 21:42

roosterfish approved these changes Jun 20, 2024

View reviewed changes

masnax merged commit 526a3b6 into canonical:main Jun 20, 2024
5 checks passed

masnax mentioned this pull request Jun 25, 2024

Mix of reef snap versions results in microceph.daemon service failures canonical/microceph#367

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eager updateFromV1 #150

Fix eager updateFromV1 #150

masnax commented Jun 14, 2024 •

edited

Loading

masnax commented Jun 14, 2024

tomponline commented Jun 16, 2024

masnax commented Jun 16, 2024 •

edited

Loading

masnax commented Jun 17, 2024

masnax commented Jun 17, 2024

masnax commented Jun 18, 2024

roosterfish left a comment

roosterfish left a comment

Fix eager updateFromV1 #150

Fix eager updateFromV1 #150

Conversation

masnax commented Jun 14, 2024 • edited Loading

masnax commented Jun 14, 2024

tomponline commented Jun 16, 2024

masnax commented Jun 16, 2024 • edited Loading

masnax commented Jun 17, 2024

masnax commented Jun 17, 2024

masnax commented Jun 18, 2024

roosterfish left a comment

Choose a reason for hiding this comment

roosterfish left a comment

Choose a reason for hiding this comment

masnax commented Jun 14, 2024 •

edited

Loading

masnax commented Jun 16, 2024 •

edited

Loading