Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-861: Set Upgradeable=False when there is an upgrade in progress #1080

Merged
merged 7 commits into from
Nov 25, 2024

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Aug 29, 2024

This PR add a Upgradeable which fails on Processing=True in clusterversion.status.conditions.
In other words, Upgradeable=False if an upgrade is in progress, including both minor level and patch level.

In addition, this PR syncs the upgradeable at the shutdown time to ensure the unsaved (due to by the throttle) upgradeable to be saved. This part could be a separate PR too.

For example, it blocks the upgrade to 4.16.1 until the ongoing upgrade
4.14.35 -> 4.15.29 completes.

It also covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.

For OTA-861, the guard on retargeting to a minor level upgrade will be added with a follow up PR.


Update:
With the throttle of "upgradeable" disabled on the shutdown, the racing window between "CVO starts rolling the new version out" and "CVO gets shutdown" is very short (less than one second in the test).
We do not need to add the second guard back that was dropped in the 362e9ca.

The acceptedRisks for Y-then-Z upgrade in the dropped commit will be done with another PR.

@hongkailiu
Copy link
Member Author

This PR is replacing #1079

@hongkailiu
Copy link
Member Author

/test unit

@DavidHurta
Copy link
Contributor

/cc

@openshift-ci openshift-ci bot requested a review from DavidHurta August 29, 2024 11:59
@wking
Copy link
Member

wking commented Aug 29, 2024

Exercise a 4.17 -> this-pull update, so we can see Upgradeable=False while we're mid-update.

/payload-job periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-upgrade

Copy link
Contributor

openshift-ci bot commented Aug 29, 2024

@wking: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7d5ec080-663b-11ef-899f-2883517bb747-0

@petr-muller
Copy link
Member

/uncc

David and Trevor are involved in this one, seems enough ;)

@openshift-ci openshift-ci bot removed the request for review from petr-muller September 4, 2024 11:09
@DavidHurta
Copy link
Contributor

DavidHurta commented Sep 4, 2024

/title OTA-861: inhibit the 2nd minor version upgrade

@DavidHurta
Copy link
Contributor

DavidHurta commented Sep 4, 2024

/retitle OTA-861: inhibit the 2nd minor version upgrade

@openshift-ci openshift-ci bot changed the title inhibit the 2nd minor version upgrade OTA-861: inhibit the 2nd minor version upgrade Sep 4, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 4, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 4, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR prevents the cluster to be upgraded to x.y+2.z1 while
the upgrade to x.y+1.z2 from x.y.z3 is still in progress.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member

/test all

This commits entends the guard on the 2nd Y-stream upgrade,
i.e., blocking an Y-stream upgrade if there is already an
Y-stream upgrade in progress,
to the guard on any Y-stream upgrade if there is already an
upgrade in progress, reguardless of Y-stream or Z-stream.

For example, it covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.
@hongkailiu hongkailiu changed the title OTA-861: inhibit the 2nd minor version upgrade OTA-861: Block Y stream upgrade if any upgrade is in progress Oct 1, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 1, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

In response to this:

This PR prevents the cluster to be upgraded to x.y+2.z1 while
the upgrade to x.y+1.z2 from x.y.z3 is still in progress.

Update:
Block Y stream upgrade if any upgrade is in progress
This commits entends the guard on the 2nd Y-stream upgrade,
i.e., blocking an Y-stream upgrade if there is already an
Y-stream upgrade in progress,
to the guard on any Y-stream upgrade if there is already an
upgrade in progress, reguardless of Y-stream or Z-stream.

For example, it covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 1, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

In response to this:

This PR adds a guard that blocks an Y-stream upgrade
if there is already an upgrade in progress, reguardless of Y-stream or Z-stream.

For example, it blocks the upgrade to 4.16.1 until the ongoing upgrade
4.14.35 -> 4.15.29 completes.

It also covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This commit generates a message in the accepted risks
for the unblocking case 4.y.z -> 4.y+1.z' -> 4.y+1.z''.
@petr-muller
Copy link
Member

openshift/release#57408 should fix the hypershift jobs

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 4, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

In response to this:

This PR add a Upgradeable which fails on Processing=True in clusterversion.status.conditions.
In other words, Upgradeable=False if an upgrade is in progress, including both minor level and patch level.

In addition, this PR syncs the upgradeable at the shutdown time to ensure the unsaved (due to by the throttle) upgradeable to be saved. This part could be a separate PR too.

This PR adds a guard that blocks an Y-stream upgrade
if there is already an upgrade in progress, reguardless of Y-stream or Z-stream.

For example, it blocks the upgrade to 4.16.1 until the ongoing upgrade
4.14.35 -> 4.15.29 completes.

It also covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.

For OTA-861, the guard on retargeting to a minor level upgrade will be added with a follow up PR.


Update:
With the throttle of "upgradeable" disabled on the shutdown, the racing window between "CVO starts rolling the new version out" and "CVO gets shutdown" is very short (less than one second in the test).
We do not need to add the second guard back that was dropped in the 362e9ca.

The acceptedRisks for Y-then-Z upgrade in the dropped commit will be done with another PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member Author

/test unit

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0366e77 looks good to me too.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 4, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

In response to this:

This PR add a Upgradeable which fails on Processing=True in clusterversion.status.conditions.
In other words, Upgradeable=False if an upgrade is in progress, including both minor level and patch level.

In addition, this PR syncs the upgradeable at the shutdown time to ensure the unsaved (due to by the throttle) upgradeable to be saved. This part could be a separate PR too.

For example, it blocks the upgrade to 4.16.1 until the ongoing upgrade
4.14.35 -> 4.15.29 completes.

It also covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.

For OTA-861, the guard on retargeting to a minor level upgrade will be added with a follow up PR.


Update:
With the throttle of "upgradeable" disabled on the shutdown, the racing window between "CVO starts rolling the new version out" and "CVO gets shutdown" is very short (less than one second in the test).
We do not need to add the second guard back that was dropped in the 362e9ca.

The acceptedRisks for Y-then-Z upgrade in the dropped commit will be done with another PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member Author

/test e2e-hypershift

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 8, 2024
@hongkailiu
Copy link
Member Author

/test unit

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Nov 4, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2024
Copy link
Contributor

openshift-ci bot commented Nov 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@petr-muller
Copy link
Member

/retest

@evakhoni
Copy link
Contributor

pre-merge verified successfully in all four varients
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 24, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 24, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

In response to this:

This PR add a Upgradeable which fails on Processing=True in clusterversion.status.conditions.
In other words, Upgradeable=False if an upgrade is in progress, including both minor level and patch level.

In addition, this PR syncs the upgradeable at the shutdown time to ensure the unsaved (due to by the throttle) upgradeable to be saved. This part could be a separate PR too.

For example, it blocks the upgrade to 4.16.1 until the ongoing upgrade
4.14.35 -> 4.15.29 completes.

It also covers the case 4.14.15-> 4.14.35 -> 4.15.29
where the upgrade 4.14.35 -> 4.15.29 is blocked until the upgrade
4.14.15-> 4.14.35 completes.

Note that we still allow for upgrade to 4.y+1.z''
in the middle of upgrade 4.y.z -> 4.y+1.z', even though direct upgrade
4.y.z -> 4.y+1.z'' might not be supported.
This is because the ugprade 4.y.z -> 4.y+1.z' might not be completed
up to a bug in 4.y+1.z' that has a fix in 4.(y+1).z''.
We need the retarget to it to land 4.y+1 on the cluster.

For OTA-861, the guard on retargeting to a minor level upgrade will be added with a follow up PR.


Update:
With the throttle of "upgradeable" disabled on the shutdown, the racing window between "CVO starts rolling the new version out" and "CVO gets shutdown" is very short (less than one second in the test).
We do not need to add the second guard back that was dropped in the 362e9ca.

The acceptedRisks for Y-then-Z upgrade in the dropped commit will be done with another PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wking
Copy link
Member

wking commented Nov 25, 2024

operators should not create watch channels very often is unrelated:

/override ci/prow/e2e-agnostic-ovn

Copy link
Contributor

openshift-ci bot commented Nov 25, 2024

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-ovn

In response to this:

operators should not create watch channels very often is unrelated:

/override ci/prow/e2e-agnostic-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Nov 25, 2024

@hongkailiu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit b6b7345 into openshift:master Nov 25, 2024
12 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-version-operator
This PR has been included in build cluster-version-operator-container-v4.19.0-202411260335.p0.gb6b7345.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants