Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.1] Bug 1706082: Add Spec.Configuration to MachineConfigPool, render controller writes it #856

Merged

Conversation

cgwalters
Copy link
Member

Cherry-picked from 37e8466

See #765 (comment)

MachineConfigPool needs a Spec.Configuration and Status.Configuration
[just like other objects][1] so that we can properly detect state.
Currently there's a race because the render controller may set Status.Configuration
while the pool's Status still has Updated, so one can't reliably check whether the
pool is at a given config.

With this, ownership is clear: the render controller sets the spec, and the node controller
updates the status.

[1] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)

@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references an invalid Bugzilla bug:

  • expected the bug to target the "4.1.z" release, but it targets "4.2.0" instead

In response to this:

Bug 1706082: Add Spec.Configuration to MachineConfigPool, render controller writes it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 14, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 14, 2019
@runcom
Copy link
Member

runcom commented Jun 14, 2019

This patch makes sure the Generation field for an MCP is correctly increased (it's not right now in 4.1 because we never update the Spec field of the MCP).
With this, syncRequiredMachineConfigPools correctly checks that the MCP is actually rolling the new upgrade changes.

The race we've been seeing is:

  • 18:07:56: new Operator goes up
  • start syncing
  • syncRequiredMachineConfigPools passes cause pool master is idle but didn't start updating to new rendered-mc
  • 18:09:28: new version reported
  • 18:10:06: the master pool starts rolling the new templates

@cgwalters
Copy link
Member Author

Going to wait for a CI run before I force push again to retrigger Prow's check of the Bugzilla target (which is now 4.1.z again)

@cgwalters
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 14, 2019
@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references a valid Bugzilla bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@runcom
Copy link
Member

runcom commented Jun 14, 2019

Looks like there is a compilation issue in e2e

@cgwalters cgwalters force-pushed the pool-spec-status-4.1 branch from 633e7a4 to cb46c52 Compare June 14, 2019 15:39
Prep for adding more tests.

(cherry picked from commit 4f9bd50)
@cgwalters
Copy link
Member Author

Oh duh, looking at the diff the test suite changes are entangled with some prep PRs like 9970f24
which itself is built on top of #610 which touches the whole codebase. Going to try redoing the e2e test changes without that last one.

Rather than poll all of the daemons, add a helper that waits
for a pool to complete a config.

One of our tests walks over the MCDs, change it to just assert
on all of the nodes.

The SSH test can also just wait for a pool and then `rsh` to
each node.

(cherry picked from commit 9970f24)
See openshift#765 (comment)

MachineConfigPool needs a `Spec.Configuration` and `Status.Configuration`
[just like other objects][1] so that we can properly detect state.
Currently there's a race because the render controller may set `Status.Configuration`
while the pool's `Status` still has `Updated`, so one can't reliably check whether the
pool is at a given config.

With this, ownership is clear: the render controller sets the spec, and the node controller
updates the status.

[1] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
@cgwalters cgwalters force-pushed the pool-spec-status-4.1 branch from cb46c52 to 7cdc5b7 Compare June 14, 2019 17:54
@cgwalters
Copy link
Member Author

ci/prow/e2e-aws-op — Job succeeded.

🎉

@runcom
Copy link
Member

runcom commented Jun 14, 2019

/approve

@kikisdeliveryservice
Copy link
Contributor

yay!!

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 14, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member Author

Anyone know who takes care of "Pending — Not mergeable. Needs cherry-pick-approved label. " ?

@kikisdeliveryservice
Copy link
Contributor

Anyone know who takes care of "Pending — Not mergeable. Needs cherry-pick-approved label. " ?

yeah i talked to @runcom and i'll get that on next week.

@kikisdeliveryservice kikisdeliveryservice changed the title Bug 1706082: Add Spec.Configuration to MachineConfigPool, render controller writes it [release-4.1] Bug 1706082: Add Spec.Configuration to MachineConfigPool, render controller writes it Jun 21, 2019
@eparis
Copy link
Member

eparis commented Jun 24, 2019

Parent BZ is still not verified by QA. Reaching out to QA might be helpful when I re-review for 4.1.4 inclusion tomorrow.

I see you have some CI testing around this. But I'm not entirely sure of the source of these changes. Is this a combination of #765 and #773 ?

What's the impact of not merging this?

@runcom
Copy link
Member

runcom commented Jun 24, 2019

What's the impact of not merging this?

we won't tell exactly that an upgrade completed because we'll report "OK" doing the comparison with the old version (this is what this fixes)

@cgwalters
Copy link
Member Author

But I'm not entirely sure of the source of these changes. Is this a combination of #765 and #773 ?

Yep, exactly.

What's the impact of not merging this?

The bug symptoms would persist; the MCO could claim it was done upgrading when it hadn't (transiently).

And/or we'd have to come up with a different fix which...seems at best equally risky. We know this code works in 4.2.

@eparis
Copy link
Member

eparis commented Jun 25, 2019

we've been in master since May 16. Even though QA has not VERIFIED I'm going to let this through. We likely have had a reasonable about of CI soak time in master.

@eparis eparis added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jun 25, 2019
@openshift-merge-robot openshift-merge-robot merged commit d3faa93 into openshift:release-4.1 Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants