Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-11815: e2e: Added node inspector for inspecting nodes configuration #1008

Merged

Conversation

rbaturov
Copy link
Contributor

@rbaturov rbaturov commented Apr 1, 2024

On hypershift there is no MCO, hence there are no machine-config-daemon pods.
A different resolution is needed for accessing the underlying node for inspecting configurations.
This commit introduces a node inspector implemented as a daemonset.
Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node.
Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup.
This API will be used for both hypershift and non-hypershift systems.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2024
@openshift-ci openshift-ci bot requested review from kpouget and Tal-or April 1, 2024 07:44
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 56d838d to 43743a9 Compare April 1, 2024 08:29
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, possible improvements inside

@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 43743a9 to 7c6d163 Compare April 2, 2024 11:23
@rbaturov rbaturov changed the title WIP: Added daemonset for inspecting nodes configuration CNF-11815: Added daemonset for inspecting nodes configuration Apr 2, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 2, 2024

@rbaturov: This pull request references CNF-11815 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

On hypershift there is no MCO, hence there are no machine-config-daemon pods. A different resolution is needed for accessing the underlying node for inspecting configurations. This commit adds a daemonset that will start upon test suites - on every node we will deploy a pod with escalated privileges and host fs mounted. This API will be used for both hypershift and non-hypershift systems.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 2, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2024
@rbaturov
Copy link
Contributor Author

rbaturov commented Apr 2, 2024

/hold
Depends on #1004

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2024
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch 2 times, most recently from d61ea00 to 294c4af Compare April 2, 2024 13:17
@rbaturov rbaturov changed the title CNF-11815: Added daemonset for inspecting nodes configuration CNF-11815: e2e: Added daemonset for inspecting nodes configuration Apr 2, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2024
}

// ExecCommandOnMachineConfigDaemon returns the output of the command execution on the machine-config-daemon pod that runs on the specified node
func ExecCommandOnMachineConfigDaemon(ctx context.Context, node *corev1.Node, command []string) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both ExecCommandOnTestingDaemonPod and ExecCommandOnNode have the same purpose.
Unless there is a very good reason to have both functions, I would merge them into one named:
ExecCommandOnNode which declare explicitly where the command is about to be running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tal-or
ExecCommandOnTestingDaemonPod is being used as an underlying function for ExecCommandOnNode.
The difference between the two is that ExecCommandOnNode returns a string (after some formatting) rather than a bytes slice (ExecCommandOnTestingDaemonPod).
As throughout the repo, developers have chosen to use both of them, I think we should keep both of these functions and just rename them appropriately. (Another option is to unify them to one function that returns []bytes and change all calls to target this func, following a call if needed to a util function that will do the string format that is left). But again the latter option would require some work for refactoring.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's keep them and just rename ExecCommandOnTestingDaemonPod.
We don't want to steer too much from the scope of this work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tal-or I'll need to rename ExecCommandOnNode to ExecCommandOnNodeToString and ExecCommandOnTestingDaemonPod which returns a []byte to ExecCommandOnNode.
This however, may cause a slight confusion for developers that got used that ExecCommandOnNode returns a string. but I think this is not a big deal.
Another option could be to change ExecCommandOnTestingDaemonPod to ExecCommandOnNodeToByteSlice. I personally think we should go with the first option, even if it might cause a bit of confusion.

Copy link
Contributor

@Tal-or Tal-or Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the name of the package is nodes I would rename ExecCommandOnTestingDaemonPod to ExecCommand and call it a day.
From the caller site it looks like:
nodes.ExecCommand() The package name provides a clear indication about where this command is about to be running.

Copy link
Contributor Author

@rbaturov rbaturov Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what will be the name of the second func?
Anyways I was thinking that we should move these functions to the node inspector package, as they rely on it running.
So it's going to be more of nodeInspector.ExecCommand...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node inspector is an implementation detail. you can call nodeInspector.ExecCommand from nodes.ExecCommand

@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 294c4af to 3648ece Compare April 15, 2024 07:28
@rbaturov rbaturov changed the title CNF-11815: e2e: Added daemonset for inspecting nodes configuration WIP: CNF-11815: e2e: Added daemonset for inspecting nodes configuration Apr 15, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2024
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 3648ece to ddddb84 Compare April 15, 2024 11:48
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2024
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from ddddb84 to 1247887 Compare April 15, 2024 12:00
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 15, 2024

@rbaturov: This pull request references CNF-11815 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

On hypershift there is no MCO, hence there are no machine-config-daemon pods.
A different resolution is needed for accessing the underlying node for inspecting configurations.
This commit introduces a node inspector implemented as a daemonset.
Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node.
Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup.
This API will be used for both hypershift and non-hypershift systems.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rbaturov rbaturov changed the title WIP: CNF-11815: e2e: Added daemonset for inspecting nodes configuration WIP: CNF-11815: e2e: Added node inspector for inspecting nodes configuration Apr 15, 2024
@rbaturov
Copy link
Contributor Author

/retest-required

@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 1247887 to ee990a0 Compare April 16, 2024 08:18
@rbaturov rbaturov changed the title WIP: CNF-11815: e2e: Added node inspector for inspecting nodes configuration CNF-11815: e2e: Added node inspector for inspecting nodes configuration Apr 16, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2024
@rbaturov
Copy link
Contributor Author

@Tal-or ready for another review iteration

@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 90cba92 to 14f169e Compare May 27, 2024 07:28
@rbaturov
Copy link
Contributor Author

/retest-required

@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 14f169e to 390015c Compare May 30, 2024 08:28
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I like the direction and we surely want this change. A few inline comments/questions

Copy link
Contributor

openshift-ci bot commented May 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, rbaturov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2024
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 390015c to fd8cb0c Compare June 2, 2024 10:17
@rbaturov
Copy link
Contributor Author

rbaturov commented Jun 2, 2024

@Tal-or @ffromani
Did some modifications:

  1. Calling Z_deconfig as part of a trap command in the run-test.sh script will ensure cleanup occurs, even if we run through failures mid-way.
  2. Changed the node-inspector delete function, to delete namespace (which will implicitly delete service account, cluster-role-binding), thereby reducing API calls. Clusterrole requires an additional delete request as it is not namespaced.

Makefile Outdated Show resolved Hide resolved
hack/run-test.sh Show resolved Hide resolved
test/e2e/performanceprofile/functests/utils/nodes/nodes.go Outdated Show resolved Hide resolved
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch 3 times, most recently from dc71b58 to a7b20dc Compare June 3, 2024 17:10
rbaturov added 3 commits June 4, 2024 14:54
On hypershift there is no MCO, hence there are no machine-config-daemon pods.
A different resolution is needed for accessing the underlying node for inspecting configurations.
This commit introduces a node inspector implemented as a daemonset.
Upon execution of test suites, a pod with elevated privileges and host filesystem mounted will be deployed on every node.
Also I have added Z-deconfig suite ('Z' prefix, will guarantee that it will be the last suite run) that will be used for cleanup.
This API will be used for both hypershift and non-hypershift systems.

Signed-off-by: Ronny Baturov <[email protected]>
Adding trap command in the run-test.sh script will ensure the
Z_deconfig suite will run, thereby deleting the node inspector.

Signed-off-by: Ronny Baturov <[email protected]>
Updated these functions to use the node inspector rather than the MCD.
Furthermore, I have renamed these functions to better reflect their purposes and usage.

Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 639d431 to 41e92a5 Compare June 4, 2024 11:54
@rbaturov
Copy link
Contributor Author

rbaturov commented Jun 4, 2024

/retest-required

2 similar comments
@rbaturov
Copy link
Contributor Author

rbaturov commented Jun 4, 2024

/retest-required

@rbaturov
Copy link
Contributor Author

rbaturov commented Jun 5, 2024

/retest-required

Refactor all calls to the unified function: ExecCommand.
Added a util ToString function that will be used for the calls that
require the output in a string format.

Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov rbaturov force-pushed the add-daemonset-for-tests branch from 41e92a5 to 362196e Compare June 5, 2024 09:12
@rbaturov
Copy link
Contributor Author

rbaturov commented Jun 5, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Jun 5, 2024

@rbaturov: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Tal-or
Copy link
Contributor

Tal-or commented Jun 5, 2024

/lgtm
Thank you for all your work on this

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2024
@ffromani
Copy link
Contributor

ffromani commented Jun 5, 2024

/label acknowledge-critical-fixes-only
test-only code and this is an improvement anyway. Plus, we depend on this work for a critical feature

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Jun 5, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit b057528 into openshift:master Jun 5, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants