Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-861: inhibit the 2nd minor version upgrade #1079

Closed
wants to merge 1 commit into from

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Aug 20, 2024

This PR prevents the cluster to be upgraded to x.y+2.z1 while
the upgrade to x.y+1.z2 from x.y.z3 is still in progress.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 20, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 20, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from DavidHurta and wking August 20, 2024 21:36
@hongkailiu
Copy link
Member Author

/test e2e-aws-ovn-techpreview

@hongkailiu hongkailiu force-pushed the OTA-861 branch 2 times, most recently from fab4595 to c985205 Compare August 21, 2024 00:59
@hongkailiu
Copy link
Member Author

/test e2e-aws-ovn-techpreview

@DavidHurta
Copy link
Contributor

DavidHurta commented Aug 21, 2024

Hello @hongkailiu 👋
The pull request contains two commits that have an identical message:

$ git log --oneline
c9852051 (HEAD -> OTA-861) inhibit the 2nd minor version upgrade
08ece187 inhibit the 2nd minor version upgrade

Could you please update the commits and provide some additional information in them as well? The additional information (even a little) would be greatly appreciated 🙌

@hongkailiu hongkailiu force-pushed the OTA-861 branch 2 times, most recently from b751dc9 to 1eb2598 Compare August 22, 2024 01:39
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 22, 2024

@hongkailiu: This pull request references OTA-861 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR prevents the cluster to be upgraded to x.y+2.z1 while
the upgrade to x.y+1.z2 from x.y.z3 is still in progress.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member

I'll sit this one out and leave it for David to review 😉

@DavidHurta
Copy link
Contributor

Petr, back to status API!

Copy link
Contributor

@DavidHurta DavidHurta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss the wording of the precondition check 🙌

Other than that, the PR seems to be good to merge.

@DavidHurta
Copy link
Contributor

tldr; everything seems to be working as expected 🙌

I wanted to learn something new by testing this PR. I have created three dummy releases. All these releases contain this PR and only differ in their version. I had to force the upgrades, as the releases were not signed.

Upgrade from 4.17.0-0.ci.test-2024-08-26-132104-ci-ln-c7kz9p2-latest to 4.18.98:

$ oc adm upgrade --to-image quay.io/dhurta/ocp-release@sha256:bdb2e638b14cb5bfbc1e1502b983210dce43e6258827cedf65311391bc133065 --allow-explicit-upgrade --force --allow-upgrade-with-warnings 
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requested update to release image quay.io/dhurta/ocp-release@sha256:bdb2e638b14cb5bfbc1e1502b983210dce43e6258827cedf65311391bc133065

Checking events:

$ oc events -n openshift-cluster-version
...
4m39s                   Warning   PreconditionWarn      ClusterVersion/version                           precondition warning for payload loaded version="4.18.98" image="quay.io/dhurta/ocp-release@sha256:bdb2e638b14cb5bfbc1e1502b983210dce43e6258827cedf65311391bc133065": Precondition "ClusterVersionRecommendedUpdate" failed because of "NoChannel": Configured channel is unset, so the recommended status of updating from 4.17.0-0.ci.test-2024-08-26-150237-ci-ln-dxk8yqk-latest to 4.18.98 is unknown.

Only ClusterVersionRecommendedUpdate precondition check warning as expected.


Upgrade from 4.18.98 to 4.18.99:

$ oc adm upgrade --to-image quay.io/dhurta/ocp-release@sha256:a570a6a0957729b3f876750f774c9bb2411fd3596093b587f2de998604cdd5ee --allow-explicit-upgrade --force --allow-upgrade-with-warnings 
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
warning: --allow-upgrade-with-warnings is bypassing: the cluster is already upgrading:

  Reason: ClusterOperatorsUpdating
  Message: Working towards 4.18.98: 110 of 900 done (12% complete), waiting on etcd, kube-apiserver
Requested update to release image quay.io/dhurta/ocp-release@sha256:a570a6a0957729b3f876750f774c9bb2411fd3596093b587f2de998604cdd5ee

Checking events:

$ oc events -n openshift-cluster-version
...
3m9s                    Warning   PreconditionWarn      ClusterVersion/version                           precondition warning for payload loaded version="4.18.99" image="quay.io/dhurta/ocp-release@sha256:a570a6a0957729b3f876750f774c9bb2411fd3596093b587f2de998604cdd5ee": Precondition "ClusterVersionRecommendedUpdate" failed because of "NoChannel": Configured channel is unset, so the recommended status of updating from 4.18.98 to 4.18.99 is unknown.

The new precondition check does not fail when redirecting a partial minor upgrade to a newer patch version in the same minor version.


Upgrade from 4.18.99 to 4.19.99:

$ oc adm upgrade --to-image quay.io/dhurta/ocp-release@sha256:88a0a7c9e196b7df971c50856f5024891c8ecb2ce61a84220e2b193c3ced33f3 --allow-explicit-upgrade --force --allow-upgrade-with-warnings 
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
warning: --allow-upgrade-with-warnings is bypassing: the cluster is already upgrading:

  Reason: ClusterOperatorsUpdating
  Message: Working towards 4.18.99: 110 of 900 done (12% complete), waiting on etcd, kube-apiserver
Requested update to release image quay.io/dhurta/ocp-release@sha256:88a0a7c9e196b7df971c50856f5024891c8ecb2ce61a84220e2b193c3ced33f3

Checking events:

$ oc events -n openshift-cluster-version
...
83s                     Warning   PreconditionWarn      ClusterVersion/version                           precondition warning for payload loaded version="4.19.99" image="quay.io/dhurta/ocp-release@sha256:88a0a7c9e196b7df971c50856f5024891c8ecb2ce61a84220e2b193c3ced33f3": Forced through blocking failures: Multiple precondition checks failed:
* Precondition "ClusterVersionUpgradeable" failed because of "MinorVersionClusterUpgradeInProgress": The minor level upgrade to 4.19.99 is not allowed while there is already a minor level upgrade from 4.17.0-0.ci.test-2024-08-26-150237-ci-ln-dxk8yqk-latest to 4.18.99 in progress. Please wait until the existing minor level upgrade completes.
* Precondition "ClusterVersionRecommendedUpdate" failed because of "NoChannel": Configured channel is unset, so the recommended status of updating from 4.18.99 to 4.19.99 is unknown.

The precondition check ClusterVersionUpgradeable fails as expected because of the MinorVersionClusterUpgradeInProgress check.

Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please ask for approval from davoska. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongkailiu
Copy link
Member Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Aug 27, 2024
This PR prevents the cluster to be upgraded to x.y+2.z1 while
the upgrade to x.y+1.z2 from x.y.z3 is still in progress.
@hongkailiu
Copy link
Member Author

hongkailiu commented Aug 27, 2024

I have created three dummy releases. All these releases contain this PR and only differ in their version.

I was wondering how this one can be tested (even after merging) without newer Y-releases.
If you have docs (or commands in your terminal history) handy, I would like to peek.

I had to force the upgrades, as the releases were not signed.

Upgrade from 4.18.99 to 4.19.99:

I feel it is more important to see what would happen without --force. How much trouble would it take to sign them? :)

I have to some questions about the CVO code:

  • What is the difference between the UpgradeableCheckes and preconditions used in the PR? The reason I chose the latter here is that it contains the information about the targeting release version that is used in oc adm upgrade.
  • In this PR ClusterVersion.Status.Desired.Version is used to represent the targeting version of the current upgrade (it is used in the existing code). Is that always accurate? In oc adm upgrade status implementation, we did it in a different way.

@DavidHurta
Copy link
Contributor

DavidHurta commented Aug 27, 2024

I was wondering how this one can be tested (even after merging) without newer Y-releases.

I was wondering this as well. I have managed to do the following; however, there may be a more efficient way.

  1. I have created a payload that contains this PR using Cluster Bot:
build https://github.com/openshift/cluster-version-operator/pull/1079
  1. From the build job build logs I got the container image registry.build05.ci.openshift.org/ci-ln-34mjg4k/release:latest
  2. Using the oc command I tried to create a new custom release:
$ oc adm release new \
--name "4.18.99" \
--from-release "registry.build05.ci.openshift.org/ci-ln-34mjg4k/release:latest" \
--to-image "quay.io/dhurta/ocp-release:latest"
error: unable to find an image within the release that matches the base image manifest "sha256:8284a69d091aa52b5bca3b4df26ee15caff4f33ab234ccf875bca369e4bac11b", please specify --to-image-base
  1. Finally, I have succesfully created a new release and pushed it using the specified base image from previous log for the --to-image-base flag:
$ oc adm release new \
--name "4.18.98" \
--from-release "registry.build05.ci.openshift.org/ci-ln-34mjg4k/release:latest" \
--to-image "quay.io/dhurta/ocp-release:4.18.98" \
--to-image-base "registry.build05.ci.openshift.org/ci-ln-34mjg4k/stable@sha256:58c7a7ebb3de36dbec2b2a2a71a581d9e369fe6f7de1c5c165e371400eb51769"

I feel it is more important to see what would happen without --force. How much trouble would it take to sign them? :)

It is more important; however, I am not sure how to accomplish this. To sign the release using a custom private key and have the CVO check the release accordingly. That part of the CVO is not known to me very well. Help from our colleagues would be great. Let's ask them. It would be great to learn this 👀 I have tested the PR best to my current capabilities and time. I trust our QE to be thorough regarding the verification.


I have to some questions about the CVO code:

I'll get back to you 👍

@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu
Copy link
Member Author

Thanks for the cmds creating a release with PR.
I will ask the testing strategy in our team channel.

Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@hongkailiu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hongkailiu
Copy link
Member Author

/hold

#1080

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 29, 2024
@hongkailiu
Copy link
Member Author

/close

@openshift-ci openshift-ci bot closed this Aug 30, 2024
Copy link
Contributor

openshift-ci bot commented Aug 30, 2024

@hongkailiu: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants