Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster-wide operators support improvements #2479

Merged

Conversation

greyerof
Copy link
Contributor

@greyerof greyerof commented Oct 1, 2024

This commit enables the discovery of cluster-wide operator pods (controllers), but also singleNamespace/multiNamespace installation methods where the targetNamespaces were set to be different than the installation namespace. The code will look for the controller pod in the installation namespace always. Operand pods can always be tested using normal namespace or label discovery methods, but a best-effort to find them in the configured namespaces will be implemented in a follow-up PR.

As a reminder:

  • installation namespace: namespace where both the subscription and the operator group were created. This is where the operator's controller (aka operator pod) is deployed by OLM.
  • targetNamespaces: namespaced where the controller will watch for CRs. The installation namespace can appear in this list but it's not mandatory.

Also, the test cases of the operator test suite related to controller pods have been moved to the access-control test suite, as those requirements also apply to any workload. For that reason, the discovered controller pods have also been added to the normal env.testPods so all the checks can be performed on them.

To distinguish between securityContext.runAsUser and securityContext.runAsNonRoot, the original test case access-control-security-context-non-root-user-check has been renamed to access-control-security-context-non-root-user-id-check

In short, the impact of this PR is twofold:

  • All the discovered operator (controller) pods (and their containers) are now
    added to the normal list of pods/containers under test, meaning that they will be target
    of all the test cases that match the label filter (--label-filter).

  • Test cases reorg in operator and access-control test suites:
    Removed as they already existed in access-control test suite.

    • operator-automount-tokens
    • operator-run-as-user-id

    Renamed:
    access-control-security-context-non-root-user-check -> access-control-security-context-non-root-user-id-check

    Moved from operator to access-control test suite:

    • operator-read-only-file-system -> access-control-security-context-read-only-file-system
    • operator-run-as-non-root -> access-control-security-context-run-as-non-root-user-check

--

build-depends: 32446
build-depends: https://github.com/dci-labs/dallas-pipelines/pull/1248

This commit enables the discovery of cluster-wide operator pods
(controllers), but also singleNamespace/multiNamespace installation
methods where the targetNamespaces were set to be different than the
installation namespace. The code will look for the controller pod in the
installation namespace always. Operand pods can always be tested using
normal namespace or label discovery methods, but a best-effort to find
them in the configured namespaces will be implemented in a follow-up PR.

As a reminder:
- installation namespace: namespace where both the subscription and the
  operator group where created. This is where the operator's controller
  (aka operator pod) is deployed by OLM.
- targetNamespaces: namespaced where the controller will watch for CRs.
  The installation namespace can appear in this list but it's not
  mandatory.

Also, the test cases of the operator test suite related to controller
pods have been moved to the access-control test suite, as those
requirements also apply to any workload. For that reason, the discovered
controller pods have also been added to the normal env.testPods so all
the checks can be performed on them.

To distinguish between securityContext.runAsUser and
securityContext.runAsNonRoot, the original test case
access-control-security-context-non-root-user-check has been renamed to
access-control-security-context-non-root-user-id-check
@dcibot
Copy link
Collaborator

dcibot commented Oct 1, 2024

@sebrandon1
Copy link
Member

If you make the changes in the QE repo, just temporarily change the ref: from main (in .github/workflows/qe-ocp-arm-416.yaml and .github/workflows/qe-ocp-pre-main.yaml) to whatever branch you have to make the 4.16 OCP tests pass.

If the operator pods were already discovered by either label or (more
likely) namepace discovery mode, we should not add them again, just flag
them.

Also, add the operator pods' containers to the containers under test
list.
greyerof added a commit to greyerof/cnfcert-tests-verification that referenced this pull request Oct 8, 2024
The non-root check test case has been renamed in:
redhat-best-practices-for-k8s/certsuite#2479
greyerof added a commit to greyerof/cnfcert-tests-verification that referenced this pull request Oct 8, 2024
The non-root check test case has been renamed in:
redhat-best-practices-for-k8s/certsuite#2479

Also, four operator test cases have been moved to access-control test
suite. I just removed them for now as they might need more rework in
follow-up PRs.
greyerof added a commit to redhat-best-practices-for-k8s/certsuite-qe that referenced this pull request Oct 8, 2024
The non-root check test case has been renamed in:
redhat-best-practices-for-k8s/certsuite#2479

Also, four operator test cases have been moved to access-control test
suite. I just removed them for now as they might need more rework in
follow-up PRs.
@dcibot
Copy link
Collaborator

dcibot commented Oct 9, 2024

@greyerof greyerof requested review from sebrandon1 and edcdavid and removed request for sebrandon1 October 9, 2024 13:18
pkg/provider/provider.go Outdated Show resolved Hide resolved
@greyerof
Copy link
Contributor Author

@ramperher, @manurodriguez this PR impacts on the regression check in DCI jobs as some test cases are now failing on mongodb pods and/or containers. We have similar issue with the hazelcast operator that we deploy in kind clusters for our github workflows so we decided (to do in follow-up PRs) to not install it and rely exclusively on QE for operator workload tests.

@sebrandon1
Copy link
Member

I think we will have to merge this until the test case names for expected output are changed on the DCI side.

@manurodriguez
Copy link
Collaborator

@ramperher, @manurodriguez this PR impacts on the regression check in DCI jobs as some test cases are now failing on mongodb pods and/or containers. We have similar issue with the hazelcast operator that we deploy in kind clusters for our github workflows so we decided (to do in follow-up PRs) to not install it and rely exclusively on QE for operator workload tests.

@greyerof thanks for the information, I see what you mean, several tests failed

I think we will have to merge this until the test case names for expected output are changed on the DCI side.
@sebrandon1, is there anything to do in our side? some test case names?

@sebrandon1
Copy link
Member

/dci-rerun

@sebrandon1
Copy link
Member

@manurodriguez I suppose it depends if any of the expected-to-fail test cases have had their name changed in this PR. If not, then we are good. The renamed funcs are in the original comment.

@ramperher
Copy link
Collaborator

Hi, just wanted to say that I've created this to avoid installing the operator in our workload: https://softwarefactory-project.io/r/c/dci-openshift-app-agent/+/32446, testing now.

@sebrandon1 sebrandon1 merged commit bfc9411 into redhat-best-practices-for-k8s:main Oct 15, 2024
34 checks passed
sebrandon1 pushed a commit to redhat-best-practices-for-k8s/certsuite-qe that referenced this pull request Oct 15, 2024
The non-root check test case has been renamed in:
redhat-best-practices-for-k8s/certsuite#2479

Also, four operator test cases have been moved to access-control test
suite. I just removed them for now as they might need more rework in
follow-up PRs.
greyerof added a commit to greyerof/certsuite that referenced this pull request Dec 18, 2024
PR redhat-best-practices-for-k8s#2479 fixed the autodiscovery of operator's pods and also added them to
the pods-under-test list. The problem is that it was also adding every
cluster-wide operator's controller pods to that list.

With this change, the operator (controller's pod) must be running in one of the
configured/test namespaces in order to consider that CSV as part of the
operators under test.
greyerof added a commit to greyerof/certsuite that referenced this pull request Dec 18, 2024
PR redhat-best-practices-for-k8s#2479 fixed the autodiscovery of operator's pods and also added them to
the pods-under-test list. The problem is that it was also adding every
cluster-wide operator's controller pods to that list.

With this change, the operator (controller's pod) must be running in one of the
configured/test namespaces in order to consider that CSV as part of the
operators under test.
greyerof added a commit that referenced this pull request Dec 20, 2024
PR #2479 fixed the autodiscovery of operator's pods and also added them to
the pods-under-test list. The problem is that it was also adding every
cluster-wide operator's controller pods to that list.

With this change, the operator (controller's pod) must be running in one of the
configured/test namespaces in order to consider that CSV as part of the
operators under test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants