Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlinstance: add binary_version column to instances table #98830

Merged
merged 1 commit into from
Mar 21, 2023

Conversation

healthy-pod
Copy link
Contributor

@healthy-pod healthy-pod commented Mar 17, 2023

This code change adds a binary_version column to the instances table.

This is done by adding the column to the bootstrap schema for system.sql_instances,
and piggy-backing on the existing code for the V23_1_SystemRbrReadNew migration
that overwrites the live schema for this table by the bootstrap copy.

This redefinition of the meaning of the V23_1_SystemRbrReadNew is backward-incompatible
and is only possible because this commit is merged before the v23.1 branch is cut.

Release note: None
Epic: CRDB-20860

@healthy-pod healthy-pod requested a review from a team March 17, 2023 01:34
@healthy-pod healthy-pod requested a review from a team as a code owner March 17, 2023 01:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@healthy-pod healthy-pod marked this pull request as draft March 17, 2023 01:34
@healthy-pod healthy-pod force-pushed the migrate-instances-table branch 2 times, most recently from e447a40 to 99eb0be Compare March 17, 2023 02:36
@healthy-pod healthy-pod marked this pull request as ready for review March 17, 2023 03:39
@healthy-pod healthy-pod requested review from jeffswenson and knz March 17, 2023 03:39
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to link your PR to one of our epics.

Reviewed 23 of 23 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @healthy-pod and @jeffswenson)


pkg/sql/sqlinstance/instancestorage/row_codec.go line 341 at r1 (raw file):

		binaryVersionColumnIdx: func(datum tree.Datum) error {
			if datum != tree.DNull {
				binaryVersion = roachpb.MustParseVersion(string(tree.MustBeDString(datum)))

i recommend using the variant that checks the format and returns an error in case the value is invalid.

// getDeprecatedSqlInstancesDescriptorWithoutBinaryVersion returns the system.sql_instances
// table descriptor that was being used before adding a new column in the
// current version.
func getDeprecatedSqlInstancesDescriptorWithoutBinaryVersion() *descpb.TableDescriptor {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to bootstrap the DB with the V22_2 system database. It makes writing upgrade tests a lot easier and the tests are more representative of how the upgrades will run in production. See #98613 for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

schemaExistsFn: columnExists,
},
} {
if err := migrateTable(ctx, cs, d, op, tableID, systemschema.SQLInstancesTable()); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This upgrade isn't actually doing anything in production. The RBR index migration replaces the descriptor with the statically defined descriptor, which would include the column you are adding.

func sqlInstanceMigration(codec keys.SQLCodec) rbrMigration {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I then remove the migration and keep the test case (run it after RBR index migration)? Anything else I should do? Actually I can't find the version associated with the RBR index migration. Can you please give more details?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the upgrade and version gate that will replace the descriptor with the new version.

upgrade.NewTenantUpgrade(

You should be able to delete your version key and upgrade logic. I think it is a good idea to keep the test. It documents the fact it is an intended side effect of the RBR migration.

@healthy-pod healthy-pod force-pushed the migrate-instances-table branch from 99eb0be to 0132479 Compare March 17, 2023 19:35
@healthy-pod
Copy link
Contributor Author

it would be good to link your PR to one of our epics.

Done

i recommend using the variant that checks the format and returns an error in case the value is invalid.

Done

@healthy-pod healthy-pod force-pushed the migrate-instances-table branch 3 times, most recently from 77240e9 to 3174505 Compare March 20, 2023 20:00
@healthy-pod
Copy link
Contributor Author

Ready for another look (pending clarification on @knz's question here).

@healthy-pod healthy-pod requested review from knz and jeffswenson March 20, 2023 21:36
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the migration, then this entire PR doesn't do what it says on the label, as it does not add a binary_version column to the instances table. I'll want the clarification before I even start the review.

Reviewed 1 of 2 files at r2, 12 of 12 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @healthy-pod and @jeffswenson)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand. Great work. :lgtm: modulo updating the commit message (and PR description)

(See below for another followup action)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @healthy-pod and @jeffswenson)


-- commits line 5 at r3:
Now that I understand what this is doing, please add more words to this commit message to explain it as well:

"""
This is done by adding the column to the bootstrap schema for system.sql_instances,
and piggy-backing on the existing code for the V23_1_SystemRbrReadNew migration
that overwrites the live schema for this table by the bootstrap copy.

This redefinition of the meaning of the V23_1_SystemRbrReadNew is backward-incompatible
and is only possible because this commit is merged before the v23.1 branch is cut.
"""


pkg/upgrade/upgrades/upgrades.go line 254 at r3 (raw file):

		toCV(clusterversion.V23_1_SystemRbrReadNew),
		upgrade.NoPrecondition,
		backfillRegionalByRowIndex,

Please file an issue to be processed at very high priority by either the Serverless or Schema team, to enhance this function with a hardwired copy of the bootstrap descriptors at that version.

Otherwise, any additional changes to these tables in future CRDB versions will result in schema corruption for folk upgrading from an older version.

@healthy-pod healthy-pod force-pushed the migrate-instances-table branch from 3174505 to 03a8cf5 Compare March 20, 2023 22:11
Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@healthy-pod
Copy link
Contributor Author

TFTRs!

bors r=[knz,JeffSwenson]

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed (retrying...):

@knz
Copy link
Contributor

knz commented Mar 21, 2023

bors r-

(merge conflict)

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Canceled.

healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Jul 17, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that upgrade is not blocked by `preserve_downgrade_option`.
    - Exit if tenant cluster version is equal to storage cluster version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade blocked
      due to too low binary version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 30 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Sep 15, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Sep 25, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Oct 6, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Oct 6, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Oct 9, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Oct 9, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Oct 9, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: None
Epic: CRDB-20860
healthy-pod pushed a commit to healthy-pod/cockroach that referenced this pull request Oct 9, 2023
Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged cockroachdb#98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: Support for automatic finalization of virtual clusters version
upgrade was added. A new setting `cluster.auto_upgrade.enabled` was added to
enable and disable automatic cluster version upgrade (finalization) and it
will be used with automatic upgrade at both storage-level and virtual cluster
level (each has its own setting value though).
Epic: CRDB-20860
craig bot pushed a commit that referenced this pull request Oct 10, 2023
102427: pkg/server: support tenant auto-upgrade r=stevendanna,knz a=healthy-pod

Previously, tenant upgrades in UA required a user to issue a `SET CLUSTER SETTING version =`
statement to finalize an upgrade. This UX is different from what we have in single-tenant
SH/Dedicated deployments in that we have auto upgrade in the later that starts an attempt
to finalize cluster version after every node startup incase the node was started with a new
binary version that all nodes now support upgrading to.

In UA, we have two differences:

1. What to upgrade?
    - In a multi-tenant deployment, the storage and sql layers are upgraded separately.
    - The storage layer upgrade finalization is still handled by the existing auto upgrade logic.
    - In this change, we ensure that the sql layer is also auto-upgraded when possible.
2. When to upgrade?
    - In a single-tenant deployment, all layers share the same binary version and cluster version.
      Hence, an upgrade attempt is only needed when a new node starts to ensure that the cluster is
      auto-upgraded if the new binary version supports an upgrade.
    - In a multi-tenant deployment, in addition to the condition above, the sql server upgrade is
      also constrained by the storage cluster version. It is possible for all SQL instances to have
      binary versions that support an upgrade but the upgrade will still be blocked by the storage
      cluster version if it’s equal to the current tenant cluster version.

This code change does the following:

1. Adds logic to run a SQL server upgrade attempt (mostly adopted from the original auto upgrade code)
   within the following ordered constraints (previously we merged #98830 to make getting the binary
   versions of instances easier):
    - Ensure that the upgrade is not blocked by the secondary tenant's setting of preserve_downgrade_option
      or an all-tenant override of that value.
    - Exit if tenant cluster version is equal to the minimum instance binary version [upgrade already completed].
    - Upgrade to storage cluster version if the binary version of all SQL instances supports that.
    - Exit if storage cluster version is less than the minimum instance binary version [upgrade blocked
      due to low storage cluster version].
    - Upgrade to the minimum instance binary version.

2. Runs the logic above when a SQL server is started.
    - This covers the case where a SQL server binary upgrade allows for an upgrade to the
      tenant cluster version.

3. Checks for change in storage cluster version every 10 seconds and starts an upgrade attempt if
   it was changed.
    - This covers the case where the binary versions of all SQL instances allow for an upgrade
      but it’s blocked due to the storage cluster version.

Release note: Support for automatic finalization of virtual clusters version
upgrade was added. A new setting `cluster.auto_upgrade.enabled` was added to
enable and disable automatic cluster version upgrade (finalization) and it
will be used with automatic upgrade at both storage-level and virtual cluster
level (each has its own setting value though).
Epic: [CRDB-20860](https://cockroachlabs.atlassian.net/browse/CRDB-20860)

Co-authored-by: healthy-pod <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants