-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[th/validating-webhook] add validating admission webhook to check DpuOperatorConfig #213
base: main
Are you sure you want to change the base?
Conversation
Hi @thom311. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This looks good! Very clean for implementing the webhook. The issues you had before, how did you solve them? What were the issues such that we know what can go wrong. |
The main problem is that I didn't understand how to correctly make TLS certificates for the webhook work. One approach is to uncomment the Compared to sriov-network-operator, that seems implemented mostly by hand (not using operator-sdk?). So what is done there does not directly translate and spread out over many commits. It was not easy to understand or follow. It supports both cert-manager and service.beta.openshift.io. For this branch, only service.beta.openshift.io is implemented. That seems sufficient to me, it also seems to work on Microshift. |
312e7d3
to
8b730a9
Compare
/approve |
@@ -21,7 +21,6 @@ resources: | |||
path: github.com/openshift/dpu-operator/api/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix having to manually do go get github.com/onsi/ginkgo/[email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the upgrade is done to avoid some failure in make test
. Is there a problem, or what's the use of staying with an old version (and a different version from the toplevel go.mod
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without (cd api/ && go get github.com/onsi/ginkgo/[email protected])
, a subsequent make test
gives a warning:
$ make test
GOFLAGS='' /data/src/dpu-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
for d in . dpu-api api tools ; do \
if [ "$d" = . ] ; then \
(cd $d && go mod vendor) || exit $? ; \
fi ; \
(cd $d && go mod tidy) || exit $? ; \
done
GOFLAGS='' /data/src/dpu-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
test -s /data/src/dpu-operator/bin/setup-envtest || GOBIN=/data/src/dpu-operator/bin GOFLAGS='' go install sigs.k8s.io/controller-runtime/tools/[email protected]
test -s /data/src/dpu-operator/bin/ginkgo && /data/src/dpu-operator/bin/ginkgo version | grep -q v2.20.2 || \
GOBIN=/data/src/dpu-operator/bin GOFLAGS='' go install github.com/onsi/ginkgo/v2/[email protected]
FAST_TEST=false KUBEBUILDER_ASSETS="/data/src/dpu-operator/bin/k8s/1.27.1-linux-amd64" /data/src/dpu-operator/bin/ginkgo --repeat 4 ./... -coverprofile cover.out
Ginkgo detected a version mismatch between the Ginkgo CLI and the version of Ginkgo imported by your packages:
Ginkgo CLI Version:
2.20.2
Mismatched package versions found:
2.14.0 used by v1
Ginkgo will continue to attempt to run but you may see errors (including flag
parsing errors) and should either update your go.mod or your version of the
Ginkgo CLI to match.
To install the matching version of the CLI run
go install github.com/onsi/ginkgo/v2/ginkgo
from a path that contains a go.mod file. Alternatively you can use
go run github.com/onsi/ginkgo/v2/ginkgo
from a path that contains a go.mod file to invoke the matching version of the
Ginkgo CLI.
If you are attempting to test multiple packages that each have a different
version of the Ginkgo library with a single Ginkgo CLI that is currently
unsupported.
[1732290406] Webhook Suite - 3/3 specs ••• SUCCESS! 8.971230559s PASS
[1732290406] Cnihelper Suite - 1/1 specs • SUCCESS! 984.902µs PASS
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make gingko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean, to add a make target for calling go get ginko
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no make target make gingko
in dpu-operator main
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that exists...
But how is that helping with the problem at hand?
make test
already depends on make ginkgo
, so that was already called. The warning I pasted seems rather clear: it's not supported to have different versions of the ginkgo library.
I don't understand what you are suggesting, or what you think the problem is. What means "can you fix having to manually do go get github.com/onsi/ginkgo/[email protected]"?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thom311, wizhaoredhat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e56eb08
to
a138b5a
Compare
/test all |
@thom311: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
a138b5a
to
e23b236
Compare
08b97a3
to
9d6cf89
Compare
/* The validating webhook does not run in this setup. Otherwise we would | ||
* expect creating this CR to fail. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't it run? You commit message seems to say that there is more work needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded the commit message and this code comment.
As commit message says, that I was unable to make it work. The commit message says, that work is not currently planned, but could be done in the future (if somebody figures out how to make it work).
Probably it should be something like the following (which doesn't work):
From 1cce9259b8db7c250ae422f004313ca424f914a8 Mon Sep 17 00:00:00 2001
From: Thomas Haller <[email protected]>
Date: Mon, 25 Nov 2024 22:30:22 +0100
Subject: [PATCH 1/2] testutils: fix global variable for TestEnv
We will actually need this global variable, that gets initialized
during BeforeSuite(). Add it as "testutils.TestEnv".
---
internal/controller/suite_test.go | 2 --
internal/testutils/test_cluster.go | 6 ++++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/internal/controller/suite_test.go b/internal/controller/suite_test.go
index 41ef6740831f..1a1e1b87d90e 100644
--- a/internal/controller/suite_test.go
+++ b/internal/controller/suite_test.go
@@ -23,7 +23,6 @@ import (
. "github.com/onsi/gomega"
"k8s.io/client-go/rest"
- "sigs.k8s.io/controller-runtime/pkg/envtest"
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
//+kubebuilder:scaffold:imports
@@ -33,7 +32,6 @@ import (
// http://onsi.github.io/ginkgo/ to learn more about Ginkgo.
var cfg *rest.Config
-var testEnv *envtest.Environment
func TestControllers(t *testing.T) {
RegisterFailHandler(Fail)
diff --git a/internal/testutils/test_cluster.go b/internal/testutils/test_cluster.go
index 84b7e6dd746a..5739b121d232 100644
--- a/internal/testutils/test_cluster.go
+++ b/internal/testutils/test_cluster.go
@@ -37,6 +37,8 @@ var (
setupLog = ctrl.Log.WithName("setup")
)
+var TestEnv *envtest.Environment
+
func relativeToAbs(path string) string {
_, file, _, _ := runtime.Caller(0)
file, err := filepath.Abs(file)
@@ -48,7 +50,7 @@ func bootstrapTestEnv(restConfig *rest.Config) {
var err error
trueVal := true
By("bootstrapping test environment")
- testEnv := &envtest.Environment{
+ TestEnv = &envtest.Environment{
CRDDirectoryPaths: []string{
relativeToAbs("../../config/crd/bases"),
relativeToAbs("../../test/crd"),
@@ -58,7 +60,7 @@ func bootstrapTestEnv(restConfig *rest.Config) {
Config: restConfig,
}
By("starting the test env")
- cfg, err := testEnv.Start()
+ cfg, err := TestEnv.Start()
Expect(err).NotTo(HaveOccurred())
Expect(cfg).NotTo(BeNil())
}
base-commit: cccb01b501f633015d7e48d1afb2fb66c60b66cd
--
2.47.1
From 62afdecc264dccbf2a3fc8836c4ee1940d20d89a Mon Sep 17 00:00:00 2001
From: Thomas Haller <[email protected]>
Date: Mon, 2 Dec 2024 17:53:26 +0100
Subject: [PATCH 2/2] controller/test: enable controller-test for new webhook
This doesn't actually work.
---
.../dpuoperatorconfig_controller_test.go | 25 ++++++++++++++++++-
internal/testutils/test_cluster.go | 3 +++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/internal/controller/dpuoperatorconfig_controller_test.go b/internal/controller/dpuoperatorconfig_controller_test.go
index 6c3a834899f4..497fc3c1b714 100755
--- a/internal/controller/dpuoperatorconfig_controller_test.go
+++ b/internal/controller/dpuoperatorconfig_controller_test.go
@@ -2,8 +2,12 @@ package controller
import (
"context"
+ "crypto/tls"
+ "fmt"
+ "net"
"os"
"sync"
+ "time"
netattdefv1 "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/apis/k8s.cni.cncf.io/v1"
@@ -120,16 +124,24 @@ func startDPUControllerManager(ctx context.Context, client *rest.Config, wg *syn
utilruntime.Must(configv1.AddToScheme(scheme))
utilruntime.Must(netattdefv1.AddToScheme(scheme))
+ webhookInstallOptions := &testutils.TestEnv.WebhookInstallOptions
mgr, err := ctrl.NewManager(client, ctrl.Options{
Scheme: scheme,
Metrics: server.Options{
BindAddress: ":18001",
},
- WebhookServer: webhook.NewServer(webhook.Options{Port: 9443}),
+ WebhookServer: webhook.NewServer(webhook.Options{
+ Host: webhookInstallOptions.LocalServingHost,
+ Port: webhookInstallOptions.LocalServingPort,
+ CertDir: webhookInstallOptions.LocalServingCertDir,
+ }),
LeaderElectionID: "1e46962d.openshift.io",
})
Expect(err).NotTo(HaveOccurred())
+ err = (&configv1.DpuOperatorConfig{}).SetupWebhookWithManager(mgr)
+ Expect(err).NotTo(HaveOccurred())
+
b := NewDpuOperatorConfigReconciler(mgr.GetClient(), mgr.GetScheme(), "mock-image", plugin.CreateVspImagesMap(false, setupLog))
err = b.SetupWithManager(mgr)
Expect(err).NotTo(HaveOccurred())
@@ -143,6 +155,17 @@ func startDPUControllerManager(ctx context.Context, client *rest.Config, wg *syn
}()
<-mgr.Elected()
+ // wait for the webhook server to get ready
+ dialer := &net.Dialer{Timeout: time.Second}
+ addrPort := fmt.Sprintf("%s:%d", webhookInstallOptions.LocalServingHost, webhookInstallOptions.LocalServingPort)
+ Eventually(func() error {
+ conn, err := tls.DialWithDialer(dialer, "tcp", addrPort, &tls.Config{InsecureSkipVerify: true})
+ if err != nil {
+ return err
+ }
+ return conn.Close()
+ }).Should(Succeed())
+
return mgr
}
diff --git a/internal/testutils/test_cluster.go b/internal/testutils/test_cluster.go
index 5739b121d232..eea46b5e9db3 100644
--- a/internal/testutils/test_cluster.go
+++ b/internal/testutils/test_cluster.go
@@ -58,6 +58,9 @@ func bootstrapTestEnv(restConfig *rest.Config) {
ErrorIfCRDPathMissing: true,
UseExistingCluster: &trueVal,
Config: restConfig,
+ WebhookInstallOptions: envtest.WebhookInstallOptions{
+ Paths: []string{relativeToAbs("../../config/webhook")},
+ },
}
By("starting the test env")
cfg, err := TestEnv.Start()
--
2.47.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bn222 my point is: I tried for a long time (and failed) to add the validating webhook to dpuoperatorconfig_controller_test.go
.
If you think that is necessary, then I'd need some pointers (possibly building on the patch ^^).
But it shouldn't be necessary, because:
- there is a working unit test that has the validating webhook running. This one does test the webhook.
- a validating webhook can only reject a CR. Having that in
dpuoperatorconfig_controller_test.go
only makes a difference, when trying to configure invalid CRs. Which we can do on the other unit test. - extending the tests can be done later at any time.
cccb01b
to
7d8a9e2
Compare
7d8a9e2
to
212d62d
Compare
cancelled ci because we need to get vsp as lib in first |
212d62d
to
98de006
Compare
@vrindle you're in an excellent position to review this in detail since you've approached this differently. |
/unhold |
98de006
to
17d03e0
Compare
This file was generated by operator-sdk, but somehow it lacks a code comment that is present if you generate the file again. It's useful that we don't add unnecessary differences to the generated code. Also, the comment simply makes less sense by dropping half of the sentence.
We get a warning during deploy: # Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically. For file "config/default/kustomization.yaml", the code was originally generated by operator-sdk. In the meantime, if you re-run the same command with a newer operator-sdk version, it will use "patches". Adjust this file to avoid the deprecation warning, but also make it similar to what a recent operator-sdk version would generated. The file "config/dev/kustomization.yaml" was (probably?) manually added. Also there, replace "patchesStrategicMerge". See for example https://github.com/kubernetes-sigs/kustomize/issues/ ## 5052
We have multiple go.mod files, this does not seem to agree with operator-sdk. operator-sdk will generate files and call `make generate`. However, that will then fail because we didn't call `go mod tidy` for api/ directory. Hack `make generate` to call `go mod tidy` first. We will do this now, then run operator-sdk, and then revert this change again. It will be reverted, because we don't want this behavior usually. Usually, we want to separate "generate" from "vendor". Otherwise: $ git checkout -B C origin/main && \ git reset --hard HEAD && \ git clean -fdx && \ make operator-sdk && \ ./bin/operator-sdk create webhook --group config --version v1 --kind DpuOperatorConfig --defaulting --programmatic-validation branch 'C' set up to track 'origin/main'. Reset branch 'C' Your branch is up to date with 'origin/main'. HEAD is now at e58a9c9 Merge pull request openshift#194 from thom311/th/hack-scripts-cleanup INFO[0000] Writing kustomize manifests for you to edit... ERRO[0000] Unable to find the target #- path: manager_webhook_patch.yaml to uncomment in the file config/default/kustomization.yaml. INFO[0000] Writing scaffold for you to edit... INFO[0000] api/v1/dpuoperatorconfig_webhook.go INFO[0000] api/v1/dpuoperatorconfig_webhook_test.go INFO[0000] api/v1/webhook_suite_test.go INFO[0000] Update dependencies: $ go mod tidy INFO[0000] Running make: $ make generate test -s /data/src/dpu-operator/bin/controller-gen && /data/src/dpu-operator/bin/controller-gen --version | grep -q v0.15.0 || \ GOBIN=/data/src/dpu-operator/bin GOFLAGS='' go install sigs.k8s.io/controller-tools/cmd/[email protected] GOFLAGS='' /data/src/dpu-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..." Error: load packages in root "/data/src/dpu-operator/api": err: exit status 1: stderr: go: updates to go.mod needed; to update it: go mod tidy
This commit is fully generated without manual changes. $ make operator-sdk && \ ./bin/operator-sdk create webhook --group config --version v1 --kind DpuOperatorConfig --defaulting --programmatic-validation && \ make vendor && \ make manifests && \ (cd api/ && go get github.com/onsi/ginkgo/[email protected]) && \ make vendor && \ make manifests && \ make generate https://sdk.operatorframework.io/docs/building-operators/golang/webhook/
…utating For now, we only need a validating and not a mutating webhook. The parent commit generated both a validating and a mutating commit. This commit on top of that and undoes the parts that are about enabling the mutating webhook. It's done this way, so we first generate both kinds of webhooks, and then undo part of it. The benefit is that if we ever want to add the mutating webhook, we can git-revert this commit. We also clearer see the difference by doing it this way. This commit has no manual changes, it is fully generated by running the following command on the parent commit. $ make operator-sdk && \ ./bin/operator-sdk create webhook --group config --version v1 --kind DpuOperatorConfig --programmatic-validation && \ make vendor && \ make manifests && \ (cd api/ && go get github.com/onsi/ginkgo/[email protected]) && \ make vendor && \ make manifests && \ make generate
…rate`" We needed to vendor during `make generate` to get `operator-sdk` passing. Now we ran operator-sdk, and this temporary commit will be reverted. This reverts commit 9586454.
…o) for webhooks We need to have suitable SSL certifictes for the webhook. Inject them via Service Serving certificate ([1], [2]). Note that the generated code suggests to use [CERTMANAGER]. But that requires having the cert-manager operator, which is not installed by default in openshift ([3]). Note that sriov-network-operator also defaults to Service Serving Certificate, however, I think it also allows to use cert-manager. This commit does not enable that for dpu-operator. [1] https://docs.openshift.com/container-platform/4.16/security/certificates/service-serving-certificate.html [2] https://github.com/openshift/service-ca-operator/tree/master [3] https://docs.openshift.com/container-platform/4.16/security/cert_manager_operator/cert-manager-operator-install.html
It makes no sense to have more than one DpuOperatorConfig instance. Enforce uniqueness, by ensuring that the name of the config is "dpu-operator-config". Note that "sriov-network-operator" has a similar check ([1]). One difference is that the name is enforced to be "default". That doesn't seem a good name to me (because "default" makes it sound as if you could still create a non-default instance, which you cannot). Also, the name "dpu-operator-config" is already established and used. [1] https://github.com/openshift/sriov-network-operator/blob/45f22daa4ca8fb10f64e280d0aa2b7925da717d7/pkg/webhook/validate.go#L42 https://issues.redhat.com/browse/IIC-317
Theoretically, we just setup the webhook and can use the webhook in our controller test. Somehow related to WebhookInstallOptions it should work. Except, I don't get it to work. Instead, there webhook is not running in this unit test. We however have them in the "Webhook Suite", so there are unit tests about the webhook there. Instead, adjust the controller test for the (lack of) webhook changes. Note that this doesn't actually check the webhook, since it's not running in this setup. It rather tests, that the webhook is not running. If we ever make the webhook working for dpuoperatorconfig_controller_test test, we will need to adjust this. That is intentional.
17d03e0
to
afae1b7
Compare
@thom311: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
If we can't get this working in kind, we need at least e2e test. Can you add tests against a cda cluster? Definitely remove the piece of code that doesn't make sense: |
operator-sdk create webhook
to add the scaffolding for a validating admission webhookDpuOperatorConfig
dpu-operator-config
https://issues.redhat.com/browse/IIC-317
Missing: run the webhook in the kind cluster. As such, the tests don't cover this (though, there is a unit test that checks the validation code a bit). Maybe
make test
also need to run the admission webhook (if possible).