Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce update order for OVN for Ctlplane/EDPM #792

Merged

Conversation

dprince
Copy link
Contributor

@dprince dprince commented May 7, 2024

Enforce update order for OVN for Ctlplane/EDPM
Jira: OSPRH-6732

@openshift-ci openshift-ci bot requested review from frenzyfriday and lewisdenny May 7, 2024 18:48
@openshift-ci openshift-ci bot added the approved label May 7, 2024
@dprince
Copy link
Contributor Author

dprince commented May 7, 2024

/hold

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8e0ea450916246cd80dd8146d3b20e6d

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 44s
podified-multinode-edpm-deployment-crc FAILURE in 1h 08m 29s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 08m 36s
openstack-operator-tempest-multinode FAILURE in 1h 06m 35s

@lewisdenny lewisdenny removed their request for review May 8, 2024 00:36
@dprince dprince force-pushed the ovn_update_order_edpm branch from 2df42b5 to f996cec Compare May 8, 2024 13:21
@dprince dprince requested review from fmount, stuggi, abays and gibizer May 8, 2024 13:46
pkg/openstack/dataplane.go Outdated Show resolved Hide resolved
pkg/openstack/dataplane.go Outdated Show resolved Hide resolved
controllers/core/openstackversion_controller.go Outdated Show resolved Hide resolved
controllers/core/openstackversion_controller.go Outdated Show resolved Hide resolved
controllers/core/openstackversion_controller.go Outdated Show resolved Hide resolved
pkg/openstack/dataplane.go Outdated Show resolved Hide resolved
pkg/openstack/dataplane.go Outdated Show resolved Hide resolved
pkg/openstack/dataplane.go Outdated Show resolved Hide resolved
controllers/core/openstackversion_controller.go Outdated Show resolved Hide resolved
pkg/openstack/keystone.go Show resolved Hide resolved
pkg/openstack/version.go Outdated Show resolved Hide resolved
controllers/core/openstackcontrolplane_controller.go Outdated Show resolved Hide resolved
pkg/openstack/dataplane.go Outdated Show resolved Hide resolved
pkg/openstack/dataplane.go Show resolved Hide resolved
controllers/core/openstackversion_controller.go Outdated Show resolved Hide resolved
controllers/core/openstackversion_controller.go Outdated Show resolved Hide resolved
@dprince dprince force-pushed the ovn_update_order_edpm branch 3 times, most recently from 26f8e9f to 3e0df91 Compare May 15, 2024 18:29
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/1683d7919bce48a88c1732103ab81f82

openstack-k8s-operators-content-provider FAILURE in 11m 16s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@dprince dprince force-pushed the ovn_update_order_edpm branch 2 times, most recently from cf31654 to c67650d Compare May 23, 2024 13:42
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/21aabfd5a3c940a49daa0d26f131d5c3

openstack-k8s-operators-content-provider FAILURE in 10m 52s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@dprince
Copy link
Contributor Author

dprince commented May 23, 2024

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/b8c58d7c412e4c1388e0bb7edcdaf75e

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 50m 37s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 17m 44s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 15m 37s
openstack-operator-tempest-multinode RETRY_LIMIT in 24m 06s

@dprince
Copy link
Contributor Author

dprince commented May 28, 2024

recheck

@dprince dprince force-pushed the ovn_update_order_edpm branch from c67650d to ba53816 Compare June 4, 2024 19:35
tests/functional/base_test.go Outdated Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/662941fc411642439586560bc4a94706

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 08m 08s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 34s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 24m 21s
openstack-operator-tempest-multinode FAILURE in 1h 52m 11s

@dprince
Copy link
Contributor Author

dprince commented Jun 5, 2024

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a0359f9137894b62b3af503929e991b8

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 57m 44s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 22m 46s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 20m 15s
openstack-operator-tempest-multinode FAILURE in 1h 42m 24s

@dprince dprince force-pushed the ovn_update_order_edpm branch from ba53816 to 74e6417 Compare June 8, 2024 00:07
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/7aa36ac7e4c4487c9334f98a072cbaaf

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 01m 10s
podified-multinode-edpm-deployment-crc FAILURE in 1h 39m 23s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 33m 43s
openstack-operator-tempest-multinode FAILURE in 1h 43m 48s

@dprince dprince force-pushed the ovn_update_order_edpm branch from 74e6417 to 98815e3 Compare June 10, 2024 19:44
@dprince dprince force-pushed the ovn_update_order_edpm branch from 98815e3 to ae9065e Compare June 11, 2024 12:42
Copy link
Contributor

@slawqo slawqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for me, the only thing I am not sure if I understand correctly is how all those changes in various {service}.go files are related to the PR description :)

@dprince
Copy link
Contributor Author

dprince commented Jun 11, 2024

looks good for me, the only thing I am not sure if I understand correctly is how all those changes in various {service}.go files are related to the PR description :)

in order to ensure update order we need to make sure services have been deployed (ready state, and observed generation checks).

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

@dprince
Copy link
Contributor Author

dprince commented Jun 11, 2024

recheck

@booxter
Copy link
Contributor

booxter commented Jun 11, 2024

I haven't tried to run it, but looking at the code, the order of updates looks correct.

I couldn't figure out from just reading the code how we guarantee that only ovn-controller (and not e.g. nova-compute) is updated on DP, before CP reconcileNormal is triggered. Will it be managed using a separate Deployment that would run a single ovn DP service? (I assume updating nova-compute before CP nova services is a problem.) Perhaps someone could ELI5 to me. (Thank you.)

@bshephar
Copy link
Contributor

bshephar commented Jun 12, 2024

I haven't tried to run it, but looking at the code, the order of updates looks correct.

I couldn't figure out from just reading the code how we guarantee that only ovn-controller (and not e.g. nova-compute) is updated on DP, before CP reconcileNormal is triggered. Will it be managed using a separate Deployment that would run a single ovn DP service? (I assume updating nova-compute before CP nova services is a problem.) Perhaps someone could ELI5 to me. (Thank you.)

Also trying to follow this. So we set the condition here:
https://github.com/openstack-k8s-operators/openstack-operator/pull/792/files#diff-32500fc60d27debdcd1f64468b83c0a318d79fa55591b2e17a5cb935d9fde650R248

But that just prevents the controller from continuing the update until the condition has been satisfied. It seems that the actual update of OVN on the Dataplane would need to be a manual process as described here:

https://github.com/openstack-k8s-operators/dataplane-operator/blob/main/docs/assemblies/proc_updating-the-data-plane-ovn.adoc

So this would happen first and then pause until the condition get satisfied. To satisfy the condition, the image deployed would need to match the image the update is expecting as determined by:
https://github.com/openstack-k8s-operators/openstack-operator/pull/792/files#diff-308714db370a47145837acae5ff60352d9f352513007441b8c694f2c45c1031dR38-R53

Then the update can continue.

At least that's my 10 minute read on what's happening here. The answer is that the user will create the deployment limited to the OVN service like:

apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: edpm-deployment-ipam-update
spec:
  nodeSets:
    - openstack-edpm-ipam
    - <nodeSet_name>
    - ...
    - <nodeSet_name>
  servicesOverride:
    - ovn

@bshephar bshephar self-requested a review June 12, 2024 06:48
@dprince
Copy link
Contributor Author

dprince commented Jun 12, 2024

Dataplane updates are manual for GA. There are some Jira's filed (https://issues.redhat.com/browse/OSPRH-6421) which might help us fully streamline the minor update workflow.

@dprince
Copy link
Contributor Author

dprince commented Jun 12, 2024

I do think we could also validate based on conditions set on the OpenStackVersion resource that when Dataplane resources get executed we are in the correct state. So for example if we need to just execute an OVN playbook we could have a crude validation on that. The adminstrator could always override this with Ansible, but I think a simple check like this could help us further guard the workflow in the future.

@booxter
Copy link
Contributor

booxter commented Jun 12, 2024

Thank you @bshephar @dprince this (rolling in Deployment just for ovn service) makes sense. It's a bit more leg work for a user but the main point is we have a plan.

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Jun 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprince, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit ee07591 into openstack-k8s-operators:main Jun 12, 2024
7 checks passed
booxter added a commit to booxter/os-diff that referenced this pull request Jun 12, 2024
There is no good reason to have this setting set to anything but
`false`. If set to `true`, this will render OVN controllers broken until
northd on control plane is fully upgraded.

Since [1], we guarantee that ovn controllers are always updated before
northd.

See more details in [2].

[1] openstack-k8s-operators/openstack-operator#792
[2] openstack-k8s-operators/edpm-ansible#651

Depends-On: openstack-k8s-operators/edpm-ansible#651
matbu pushed a commit to openstack-k8s-operators/os-diff that referenced this pull request Jun 17, 2024
There is no good reason to have this setting set to anything but
`false`. If set to `true`, this will render OVN controllers broken until
northd on control plane is fully upgraded.

Since [1], we guarantee that ovn controllers are always updated before
northd.

See more details in [2].

[1] openstack-k8s-operators/openstack-operator#792
[2] openstack-k8s-operators/edpm-ansible#651

Depends-On: openstack-k8s-operators/edpm-ansible#651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants