Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-28647: tuned: distinguish deferred updates #1129

Merged

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Aug 8, 2024

To fully support the usecase described in OCPBUGS-28647 and fix
the issue, we need to further distinguish between first-time
profile change and (in-place) profile change. This is required to
better support a GitOps flow.

The key distinction is if the recommended profile changes or not,
and there's a desire to defer application of changes only when a
profile is updated (e.g. sysctl modified), not the first time it
is applied.

Thus:

  • first-time profile change is a change which triggers a change of the recommended profile.
  • (in-place) profile update is a change which does NOT cause a switch to a different TuneD profile. This usually, but not exclusively, involve changes of the sysctls.

We change the way the annotation is used. We now require a value,
which can be either

  • always: every Tuned object annotated this way will have its application deferred
  • update: every Tuned object annotated this way will be processed as usual (and as it wasn't annotated) if it's a first-time profile change, but its (in-place) updates will be deferred.
  • a new internal value "never" is also added to be used internally to mean the deferred feature is disabled entirely. User can use this value but it will explicitly disable the feature (which is disabled already by default), thus is redundant and not recommended.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Aug 8, 2024
@openshift-ci-robot
Copy link
Contributor

@ffromani: This pull request references Jira Issue OCPBUGS-28647, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @shajmakh

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

To fully support the usecase described in OCPBUGS-28647 and fix the issue, we need to further distinguish between first-time profile change and (in-place) profile change.

The key distinction is if the recommended profile changes or not, and there's a desire to defer application of changes only if a profile is update, not the first time it is applied.

Thus:

  • first-time profile change is a change which triggers a change of the recommended profile
  • (in-place) profile update is a change which does NOT trigger the recommended profile, and updates the setting, usually but not exclusively the sysctls.

We change the way the annotation is used. We now require a value, which can be either

  • always: every Tuned object annotated this way will have its application deferred
  • update: every Tuned object annotated this way will be processed as usual (and as it wasn't annotated) if it's a first-time profile change, but its (in-place) updates will be deferred.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from shajmakh, Tal-or and yanirq August 8, 2024 11:52
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2024
pkg/util/annotations.go Outdated Show resolved Hide resolved
@yanirq
Copy link
Contributor

yanirq commented Aug 8, 2024

/cc @MarSik @bartwensley

@openshift-ci openshift-ci bot requested review from bartwensley and MarSik August 8, 2024 12:39
@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from 6d75a91 to 5d4af08 Compare August 8, 2024 12:42
@ffromani
Copy link
Contributor Author

ffromani commented Aug 8, 2024

/cc @jmencak

@openshift-ci openshift-ci bot requested a review from jmencak August 8, 2024 12:52
@ffromani
Copy link
Contributor Author

ffromani commented Aug 8, 2024

/retest

@jmencak
Copy link
Contributor

jmencak commented Aug 8, 2024

Thank you for the PR! I'll try to understand the code and the motivation tomorrow. Some initial thoughts.

The key distinction is if the recommended profile changes or not, and there's a desire to defer application of changes only if a profile is update, not the first time it is applied.

Do you mean "only if profile is updated"? Also, which profile? TuneD profile or the k8s CR? I assume the former, so we should be specific to make things clear.

* (in-place) profile update is a change which does NOT trigger the recommended profile, and updates the setting, usually but not exclusively the sysctls.

IIUC, s/NOT trigger the recommended profile/NOT cause a switch to a different TuneD profile/ ?

We change the way the annotation is used. We now require a value, which can be either

* always: every Tuned object annotated this way will have its application deferred

Is this correct? Shouldn't we say something along the lines "if at least 1 Tuned object annotated this way exists, profile applications will be deferred"? Similarly for "update".

Edit: I take this back, I was thinking about the old implementation.

Also, the code uses "never" DeferMode, should we document this in the PR description and the commit?

@ffromani
Copy link
Contributor Author

ffromani commented Aug 9, 2024

thanks @jmencak , I will clarify the commit message indeed.

@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from 5d4af08 to aef0cf0 Compare August 9, 2024 06:20
Copy link
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass done, just a couple of nits. Changes make sense to me.

pkg/operator/controller.go Show resolved Hide resolved
pkg/operator/controller.go Outdated Show resolved Hide resolved
pkg/tuned/controller.go Outdated Show resolved Hide resolved
@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from aef0cf0 to 0f41473 Compare August 9, 2024 07:17
@jmencak
Copy link
Contributor

jmencak commented Aug 9, 2024

After some manual testing I see one potential issue with

tuned.openshift.io/deferred: "update"
  1. Create the following manifest
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile
  namespace: openshift-cluster-node-tuning-operator
  annotations:
    tuned.openshift.io/deferred: "update"
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile
      include=openshift-node
      [sysctl]
      kernel.shmmni=8192
    name: openshift-profile
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile
  1. Profile is applied. Now change the value of kernel.shmmni to something like kernel.shmmni=16384. Profile application is refused on the grounds of the deferred update.

  2. Delete the TuneD pod responsible setting the profile => profile is applied by the new pod.

Questions:

  • Do we feel this is an issue? Should we make the information about the deferred state permanent so that TuneD pod restarts do not apply the changes?
  • Should we get this pre-merge tested?

@ffromani
Copy link
Contributor Author

ffromani commented Aug 9, 2024

After some manual testing I see one potential issue with
[...]
2. Profile is applied. Now change the value of kernel.shmmni to something like kernel.shmmni=16384. Profile application is refused on the grounds of the deferred update.
3. Delete the TuneD pod responsible setting the profile => profile is applied by the new pod.
Questions:

* Do we feel this is an issue?  Should we make the information about the deferred state permanent so that `TuneD` pod restarts do not apply the changes?

yes, it's a bug.

* Should we get this pre-merge tested?

I guess yes

@ffromani
Copy link
Contributor Author

ffromani commented Aug 9, 2024

/retest

@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from b24c4f2 to e9952ee Compare August 9, 2024 13:17
@jmencak
Copy link
Contributor

jmencak commented Aug 9, 2024

Infra issues
/retest

@jmencak
Copy link
Contributor

jmencak commented Aug 9, 2024

Thank you for the fix, Francesco. I can confirm the issue is no longer present.
/lgtm
but it would be nice to do some basic pre-merge testing to verify the functionality @liqcui . I'm sure Liquan will have questions on how to test this so this could also help to improve the docs/commit message.
/hold
to give other reviewers a chance to review this.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 9, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2024
@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from a46d8ea to ddacbff Compare August 13, 2024 08:10
Copy link
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes. I went through the code (mostly thinking about the semantics) one last time. It is tricky to assess all the corner cases, but the changes look good to me. I've also done some basic testing.

The only gray zone I found so far was when switching to a different profile using
"always" annotation and then updating it with a profile with "update"
deferred annotation. We do not define the behaviour in this case, so I
guess the current behaviour (to wait for a reboot) is fine.

I tried to make the commit message a bit more consistent and clearer.
Here is what I propose:

--- old	2024-08-13 13:38:14.916800476 +0200
+++ new	2024-08-13 13:55:13.104238181 +0200
@@ -1,6 +1,6 @@
 To fully support the usecase described in OCPBUGS-28647 and fix
 the issue, we need to further distinguish between first-time
-profile change and (in-place) profile change. This is required to
+profile change and in-place profile update. This is required to
 better support a GitOps flow.
 
 The key distinction is if the recommended profile changes or not,
@@ -11,17 +11,17 @@
 Thus:
 - first-time profile change is a change which triggers a change
   of the recommended profile.
-- (in-place) profile update is a change which does NOT cause a
-  switch to a different TuneD profile. This usually, but not
-  exclusively, involve changes of the sysctls.
+- in-place profile update is a change which does NOT cause a
+  switch to a TuneD profile with a different name. This involves
+  changes to only the contents of the currently used profile.
 
 We change the way the annotation is used. We now require a value,
 which can be either
 - always: every Tuned object annotated this way will have its
-  application deferred
+  application deferred.
 - update: every Tuned object annotated this way will be processed
   as usual (and as it wasn't annotated) if it's a first-time profile
-  change, but its (in-place) updates will be deferred.
+  change, but its in-place updates will be deferred.
 - a new internal value "never" is also added to be used internally
   to mean the deferred feature is disabled entirely.
   User can use this value but it will explicitly disable the feature

It would be nice if others, especially @MarSik / @yanirq could chime in from a higher level whether the changes in this PR are sufficient for what is needed from deferred updates.

pkg/tuned/controller.go Outdated Show resolved Hide resolved
pkg/tuned/controller.go Outdated Show resolved Hide resolved
pkg/tuned/controller.go Show resolved Hide resolved
pkg/tuned/controller.go Show resolved Hide resolved
pkg/tuned/controller.go Outdated Show resolved Hide resolved
pkg/util/annotations.go Show resolved Hide resolved
@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from ddacbff to 3baf4ec Compare August 13, 2024 13:36
@ffromani
Copy link
Contributor Author

/hold
for @MarSik and @yanirq review. (xref: #1129 (review))

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 13, 2024
@jmencak
Copy link
Contributor

jmencak commented Aug 13, 2024

Thank you for the changes.
/lgtm
Found one typo change afeter. but that can be fixed later on if you prefer.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024
To fully support the usecase described in OCPBUGS-28647 and fix
the issue, we need to further distinguish between first-time
profile change and in-place profile change. This is required to
better support a GitOps flow.

The key distinction is if the recommended profile changes or not,
and there's a desire to defer application of changes only when a
profile is updated (e.g. sysctl modified), not the first time it
is applied.

Thus:
- first-time profile change is a change which triggers a change
  of the recommended profile.
- in-place profile update is a change which does NOT cause a
  switch to a TuneD profile with a different name. This involves
  changes to only the contents of the currently used profile.

We change the way the annotation is used. We now require a value,
which can be either
- always: every Tuned object annotated this way will have its
  application deferred.
- update: every Tuned object annotated this way will be processed
  as usual (and as it wasn't annotated) if it's a first-time profile
  change, but its in-place updates will be deferred.
- a new internal value "never" is also added to be used internally
  to mean the deferred feature is disabled entirely.
  User can use this value but it will explicitly disable the feature
  (which is disabled already by default), thus is redundant and not
  recommended.

Signed-off-by: Francesco Romani <[email protected]>
now that we have the more powerful ginkgo labels,
we can stop using tags for newer tests.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the tuned-deferred-updates-on-update branch from 3baf4ec to c94303e Compare August 13, 2024 14:26
@ffromani
Copy link
Contributor Author

Thank you for the changes. /lgtm Found one typo change afeter. but that can be fixed later on if you prefer.

thanks, fixed

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024
@jmencak
Copy link
Contributor

jmencak commented Aug 13, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024
Copy link
Contributor

openshift-ci bot commented Aug 13, 2024

@ffromani: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@MarSik MarSik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I like it.

Copy link
Contributor

openshift-ci bot commented Aug 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jmencak
Copy link
Contributor

jmencak commented Aug 14, 2024

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 14, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 3655f22 into openshift:master Aug 14, 2024
16 checks passed
@openshift-ci-robot
Copy link
Contributor

@ffromani: Jira Issue OCPBUGS-28647: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-28647 has been moved to the MODIFIED state.

In response to this:

To fully support the usecase described in OCPBUGS-28647 and fix
the issue, we need to further distinguish between first-time
profile change and (in-place) profile change. This is required to
better support a GitOps flow.

The key distinction is if the recommended profile changes or not,
and there's a desire to defer application of changes only when a
profile is updated (e.g. sysctl modified), not the first time it
is applied.

Thus:

  • first-time profile change is a change which triggers a change of the recommended profile.
  • (in-place) profile update is a change which does NOT cause a switch to a different TuneD profile. This usually, but not exclusively, involve changes of the sysctls.

We change the way the annotation is used. We now require a value,
which can be either

  • always: every Tuned object annotated this way will have its application deferred
  • update: every Tuned object annotated this way will be processed as usual (and as it wasn't annotated) if it's a first-time profile change, but its (in-place) updates will be deferred.
  • a new internal value "never" is also added to be used internally to mean the deferred feature is disabled entirely. User can use this value but it will explicitly disable the feature (which is disabled already by default), thus is redundant and not recommended.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani ffromani deleted the tuned-deferred-updates-on-update branch August 14, 2024 08:37
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator
This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202408140944.p0.g3655f22.assembly.stream.el9.
All builds following this will include this PR.

@yanirq
Copy link
Contributor

yanirq commented Aug 20, 2024

/cherry-pick release-4.17

@openshift-cherrypick-robot

@yanirq: new pull request created: #1138

In response to this:

/cherry-pick release-4.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

yanirq pushed a commit to yanirq/cluster-node-tuning-operator that referenced this pull request Sep 1, 2024
* OCPBUGS-28647: tuned: distinguish deferred updates

To fully support the usecase described in OCPBUGS-28647 and fix
the issue, we need to further distinguish between first-time
profile change and in-place profile change. This is required to
better support a GitOps flow.

The key distinction is if the recommended profile changes or not,
and there's a desire to defer application of changes only when a
profile is updated (e.g. sysctl modified), not the first time it
is applied.

Thus:
- first-time profile change is a change which triggers a change
  of the recommended profile.
- in-place profile update is a change which does NOT cause a
  switch to a TuneD profile with a different name. This involves
  changes to only the contents of the currently used profile.

We change the way the annotation is used. We now require a value,
which can be either
- always: every Tuned object annotated this way will have its
  application deferred.
- update: every Tuned object annotated this way will be processed
  as usual (and as it wasn't annotated) if it's a first-time profile
  change, but its in-place updates will be deferred.
- a new internal value "never" is also added to be used internally
  to mean the deferred feature is disabled entirely.
  User can use this value but it will explicitly disable the feature
  (which is disabled already by default), thus is redundant and not
  recommended.

Signed-off-by: Francesco Romani <[email protected]>

* e2e: drop tags, use labels

now that we have the more powerful ginkgo labels,
we can stop using tags for newer tests.

Signed-off-by: Francesco Romani <[email protected]>

---------

Signed-off-by: Francesco Romani <[email protected]>
openshift-merge-bot bot pushed a commit that referenced this pull request Oct 2, 2024
)

* OCPBUGS-28647: tuned: distinguish deferred updates (#1129)

* OCPBUGS-28647: tuned: distinguish deferred updates

To fully support the usecase described in OCPBUGS-28647 and fix
the issue, we need to further distinguish between first-time
profile change and in-place profile change. This is required to
better support a GitOps flow.

The key distinction is if the recommended profile changes or not,
and there's a desire to defer application of changes only when a
profile is updated (e.g. sysctl modified), not the first time it
is applied.

Thus:
- first-time profile change is a change which triggers a change
  of the recommended profile.
- in-place profile update is a change which does NOT cause a
  switch to a TuneD profile with a different name. This involves
  changes to only the contents of the currently used profile.

We change the way the annotation is used. We now require a value,
which can be either
- always: every Tuned object annotated this way will have its
  application deferred.
- update: every Tuned object annotated this way will be processed
  as usual (and as it wasn't annotated) if it's a first-time profile
  change, but its in-place updates will be deferred.
- a new internal value "never" is also added to be used internally
  to mean the deferred feature is disabled entirely.
  User can use this value but it will explicitly disable the feature
  (which is disabled already by default), thus is redundant and not
  recommended.

Signed-off-by: Francesco Romani <[email protected]>

* e2e: drop tags, use labels

now that we have the more powerful ginkgo labels,
we can stop using tags for newer tests.

Signed-off-by: Francesco Romani <[email protected]>

---------

Signed-off-by: Francesco Romani <[email protected]>

* OCPBUGS-38795: Fix defer status during recommended profile change (#1142)

* OCPBUGS-38795: Fix defer status during recommended profile change

Process recommended profile change even when the profile itself
has not changed itself. The internal profile fingerprint tracking
was not updated during recommended profile update with no internal
changes.

* Add e2e test for defer=update

* NO-JIRA: deferred updates: fix in-place update handling on reboot (#1162)

* log: make sure to have high verbosity in tests

Signed-off-by: Francesco Romani <[email protected]>

* e2e: deferred: stricter testing

add stricter check for the node condition when
validating OCPBUGS-38795

Signed-off-by: Francesco Romani <[email protected]>

* deferred: tuned: fix reload trigger when inplace update

There are conditions on which we should not set the
reload flag. This avoid regression in the test
"Profile deferred when applied should trigger changes when applied fist,
then deferred when edited, if tuned restart should be kept deferred"

Signed-off-by: Francesco Romani <[email protected]>

* deferred: tuned: clarify comments and logs

document better the logic about processing edits

Signed-off-by: Francesco Romani <[email protected]>

---------

Signed-off-by: Francesco Romani <[email protected]>

* Omit unbackported logging support

---------

Signed-off-by: Francesco Romani <[email protected]>
Co-authored-by: Francesco Romani <[email protected]>
Co-authored-by: Martin Sivák <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants