Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade k8s libraries to v0.29.2 #5843

Merged
merged 1 commit into from
Mar 20, 2024
Merged

Conversation

hjiajing
Copy link
Contributor

@hjiajing hjiajing commented Jan 4, 2024

  1. Upgrade k8s libraries to v0.29.2.
  2. Upgrade controller-runtime to v0.16.3

@hjiajing hjiajing requested review from tnqn and luolanzone January 4, 2024 14:50
@hjiajing
Copy link
Contributor Author

hjiajing commented Jan 4, 2024

Hi @tnqn , Could you please help to view this PR. It upgrades k8s libraries from 0.26.4 to 0.29.0. Major changes are about the function wait.Poll and some other Poll functions.

And the controller-runtime is also upgraded to v0.16.3, it affects the multicluster implementation. Hi @luolanzone , Could you help review this PR, thanks.

@luolanzone
Copy link
Contributor

/test-multicluster-e2e

@@ -32,11 +32,11 @@ LABEL description="A Docker image based on the golang image, which includes code

ENV GO111MODULE=on

ARG K8S_VERSION=1.26.4
ARG K8S_VERSION=1.29.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please search 1.26.4 in antrea repo, I noticed there a few other files need to be updated.
For new version of codegen image, please refer to build/images/codegen/README.md to build a new image, we need @tnqn or @antoninbas's help to upload it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated codegen's README and hack/update-codegen.sh from v1.26.4 to v1.29.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure to change this to 1.29.2

and update the README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE. Replace v0.26.4 and v0.29.0 to 0.29.2.

@@ -42,7 +44,6 @@ import (
antreamcscheme "antrea.io/antrea/multicluster/pkg/client/clientset/versioned/scheme"
antreascheme "antrea.io/antrea/pkg/client/clientset/versioned/scheme"
"antrea.io/antrea/pkg/signals"
//+kubebuilder:scaffold:imports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to move this line?
I think it was generated automatically before, please help to check if possible to remove it, or it has a specific purpose. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is the anchor point for kubebuilder to insert code. And I did not remove it, it was formatted and moved in front of these lines. I have moved it back.

@@ -184,7 +184,7 @@ func (s *StretchedNetworkPolicyController) processNextWorkItem() bool {

if podRef, ok := obj.(types.NamespacedName); !ok {
s.queue.Forget(obj)
klog.Errorf("Expected type 'NamespacedName' in work queue but got object", "object", obj)
klog.Errorf("Expected type 'NamespacedName' in work queue but got object %v\n", obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to append "\n", klog.Errorf will handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use structured logging (klog.ErrorS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

third_party/proxy/service.go Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
@hjiajing hjiajing force-pushed the upgrade-k8s.io branch 2 times, most recently from d61dc7b to fd8e53b Compare January 5, 2024 05:36
@luolanzone luolanzone mentioned this pull request Jan 5, 2024
5 tasks
@hjiajing hjiajing force-pushed the upgrade-k8s.io branch 3 times, most recently from 141f6e9 to a4678e1 Compare January 6, 2024 15:03
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM overall, two nits.

@@ -744,10 +744,10 @@ func run(o *Options) error {
// Service would fail.
if o.config.AntreaProxy.ProxyAll {
klog.InfoS("Waiting for AntreaProxy to be ready")
if err := wait.PollUntil(time.Second, func() (bool, error) {
if err := wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this "immediate" be true for PollUtil?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question
let's keep the same behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be set to false, I missed this one. Fixed.

third_party/proxy/service.go Show resolved Hide resolved
@luolanzone
Copy link
Contributor

/test-multicluster-e2e

@luolanzone luolanzone requested a review from antoninbas January 8, 2024 03:43
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what are the nolint:staticcheck pragmas for?

For contexts, I would follow this rule:

  • use context.Background() when appropriate, in particular in unit tests
  • use context.TODO() when we may want to add an actual context object in the future (e.g., in the future maybe the function should take a context as parameter)

In general, I don't think we should have context.Background() in a pkg function.

@@ -744,10 +744,10 @@ func run(o *Options) error {
// Service would fail.
if o.config.AntreaProxy.ProxyAll {
klog.InfoS("Waiting for AntreaProxy to be ready")
if err := wait.PollUntil(time.Second, func() (bool, error) {
if err := wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question
let's keep the same behavior

Comment on lines 160 to 167
if o.SelfSignedCert {
o.options.CertDir = selfSignedCertDir
o.options.Metrics.CertDir = selfSignedCertDir
} else {
o.options.CertDir = certDir
o.options.Metrics.CertDir = certDir
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that's correct. The historic CertDir option seems to have been used for both the metrics server and for the webhook server. Do we need to configure WebhookServer.CertDir as well for the manager?

cc @luolanzone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @antoninbas is right, the historic CertDir is for webhook according to the comment. @hjiajing please help to double check and refine this part with correct fields. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used webhook.NewServer to initialize the webhook server with certDir.

@@ -256,13 +257,13 @@ func TestClusterSetMapFunc_Gateway(t *testing.T) {
}
fakeClient := fake.NewClientBuilder().WithScheme(common.TestScheme).WithObjects(clusterSet, gw1).Build()
r := NewGatewayReconciler(fakeClient, common.TestScheme, "default", []string{"10.200.1.1/16"}, nil)
requests := r.clusterSetMapFunc(clusterSet)
requests := r.clusterSetMapFunc(context.TODO(), clusterSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for unit tests, I usually suggest defining ctx := context.Background() at the beginning of the test function, then using ctx throughout the test definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, as well as other 3 files.

}
return true, nil
})
err = wait.PollUntilContextTimeout(context.Background(), 200*time.Millisecond, 10*time.Second, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe context.TODO() here?
same for other instances in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed to context.TODO().

klog.InfoS("OVS bridge local port is ready", "type", link.Type(), "attrs", link.Attrs())
return true, nil
})
wait.PollUntilContextTimeout(context.Background(), 100*time.Millisecond, 10000*time.Millisecond, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

currentEP, ok := ic.getEndpoint(endpointName)
if !ok {
klog.InfoS("HNSEndpoint doesn't exist in cache, exit current goroutine", "HNSEndpoint", endpointName)
pollErr := wait.PollUntilContextTimeout(context.Background(), 100*time.Millisecond, 60*time.Second, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -1118,7 +1118,7 @@ func TestSyncEgress(t *testing.T) {
if tt.newLocalIPs != nil {
c.localIPDetector = &fakeLocalIPDetector{localIPs: tt.newLocalIPs}
}
assert.NoError(t, wait.Poll(time.Millisecond*100, time.Second, func() (done bool, err error) {
assert.NoError(t, wait.PollUntilContextTimeout(context.Background(), time.Millisecond*100, time.Second, false, func(ctx context.Context) (done bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be a good opportunity to update these to assert.Eventually / require.Eventually

what do you think @tnqn ?

Taking into account of course this shortcoming: stretchr/testify#1396

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me

@@ -184,7 +184,7 @@ func (s *StretchedNetworkPolicyController) processNextWorkItem() bool {

if podRef, ok := obj.(types.NamespacedName); !ok {
s.queue.Forget(obj)
klog.Errorf("Expected type 'NamespacedName' in work queue but got object", "object", obj)
klog.Errorf("Expected type 'NamespacedName' in work queue but got object %v\n", obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use structured logging (klog.ErrorS)

@hjiajing hjiajing force-pushed the upgrade-k8s.io branch 2 times, most recently from a34764f to 35fbc93 Compare January 12, 2024 02:35
@antoninbas
Copy link
Contributor

Out of curiosity, what are the nolint:staticcheck pragmas for?

@hjiajing did you answer this anywhere?

@@ -246,7 +247,7 @@ func TestNamespaceMapFunc(t *testing.T) {
mcReconciler.SetRemoteCommonArea(commonArea)

r := NewLabelIdentityReconciler(fakeClient, common.TestScheme, mcReconciler, "default")
actualReq := r.namespaceMapFunc(ns)
actualReq := r.namespaceMapFunc(context.TODO(), ns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use context.Background() like in multicluster/controllers/multicluster/member/gateway_controller_test.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed all context.TODO() in unit tests to context.Backgrond().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this was addressed. I still see context.TODO() in this file

@@ -376,7 +377,7 @@ func TestAntreaIPAMDriver(t *testing.T) {

podNamespace := string(k8sArgsMap[test].K8S_POD_NAMESPACE)
podName := string(k8sArgsMap[test].K8S_POD_NAME)
err = wait.Poll(time.Millisecond*200, time.Second, func() (bool, error) {
err = wait.PollUntilContextTimeout(context.TODO(), time.Millisecond*200, time.Second, false, func(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context.Background() in unit tests

@@ -1118,7 +1118,7 @@ func TestSyncEgress(t *testing.T) {
if tt.newLocalIPs != nil {
c.localIPDetector = &fakeLocalIPDetector{localIPs: tt.newLocalIPs}
}
assert.NoError(t, wait.Poll(time.Millisecond*100, time.Second, func() (done bool, err error) {
assert.NoError(t, wait.PollUntilContextTimeout(context.TODO(), time.Millisecond*100, time.Second, false, func(ctx context.Context) (done bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you see my earlier comment: #5843 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forget to reply it, I saw it and I'm working on this part. Thanks for the reminder.

@hjiajing
Copy link
Contributor Author

Out of curiosity, what are the nolint:staticcheck pragmas for?

@hjiajing did you answer this anywhere?

Because the config is deprecated in controller-runtime, but it is not used in our code, just some declarations. So I added a nolint:staticcheck for them.

@antoninbas
Copy link
Contributor

Out of curiosity, what are the nolint:staticcheck pragmas for?

@hjiajing did you answer this anywhere?

Because the config is deprecated in controller-runtime, but it is not used in our code, just some declarations. So I added a nolint:staticcheck for them.

I see the following recommendation in the package documentation:

Deprecated: The component config package has been deprecated and will be removed in a future release. Users should migrate to their own config implementation, please share feedback in kubernetes-sigs/controller-runtime#895.

Do we have a plan to migrate to our own config implementation? If it is straightforward, should we do it as part of this PR? cc @luolanzone

@luolanzone
Copy link
Contributor

luolanzone commented Feb 1, 2024

Out of curiosity, what are the nolint:staticcheck pragmas for?

@hjiajing did you answer this anywhere?

Because the config is deprecated in controller-runtime, but it is not used in our code, just some declarations. So I added a nolint:staticcheck for them.

I see the following recommendation in the package documentation:

Deprecated: The component config package has been deprecated and will be removed in a future release. Users should migrate to their own config implementation, please share feedback in kubernetes-sigs/controller-runtime#895.

Do we have a plan to migrate to our own config implementation? If it is straightforward, should we do it as part of this PR? cc @luolanzone

Hi @antoninbas, we have one spec ControllerManagerConfigurationSpec as an inline field in MultiClusterConfig, but I think it is not used in our MC scenarios, we probably can remove config.ControllerManagerConfigurationSpec directly. @hjiajing could you double check if it's Ok to remove directly? Thanks.
Btw, please check and resolve code conflicts.

@hjiajing hjiajing force-pushed the upgrade-k8s.io branch 2 times, most recently from f400885 to 2320478 Compare February 5, 2024 09:54
@hjiajing hjiajing force-pushed the upgrade-k8s.io branch 2 times, most recently from 437374c to e26509e Compare February 25, 2024 08:32
@hjiajing
Copy link
Contributor Author

Hi @antoninbas , I have changed assert.NoError(t, wait.xxx) to assert.Eventually, could you please take a look, thanks.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.
I think we should update the PR to use 0.29.2 now, even if that means pushing a new container image for codegen. There is also a merge conflict.
Let's try to get this PR merged soon

Comment on lines 142 to 144
if *multiclusterConfig.LeaderElection.LeaderElect {
o.options.LeaderElection = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took a quick look and it seems there are as lot of fields in multiclusterConfig.LeaderElection besides LeaderElection. I guess we are just ignoring these fields? Do we even need / want to support leader election here? I assume it's not something we leverage in the installation manifests.

Should we just include the fields we actually support in MultiClusterConfig?
Alternatively, is there a standard way of converting from configv1alpha1.LeaderElectionConfiguration to manager options? I took a look at the following PR, and it seems that they are copying fields one by one:
https://github.com/kubernetes-sigs/kueue/pull/889/files

cc @luolanzone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually didn't use LeaderElection in MC. I think it's stale codes which can be removed. @hjiajing could you help to remove it? Please also remove the leaderElection in the manifests, Thanks.

@@ -246,7 +247,7 @@ func TestNamespaceMapFunc(t *testing.T) {
mcReconciler.SetRemoteCommonArea(commonArea)

r := NewLabelIdentityReconciler(fakeClient, common.TestScheme, mcReconciler, "default")
actualReq := r.namespaceMapFunc(ns)
actualReq := r.namespaceMapFunc(context.TODO(), ns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this was addressed. I still see context.TODO() in this file

require.NoError(t, wait.PollImmediate(10*time.Millisecond, time.Second, func() (done bool, err error) {
return c.queue.Len() == 1, nil
}))
require.NoError(t, wait.PollUntilContextTimeout(context.Background(), 10*time.Millisecond, time.Second, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe require.Eventually then, given that you have made the change for assert?

same comment in other places as well

Copy link
Contributor Author

@hjiajing hjiajing Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this was addressed. I still see context.TODO() in this file

Sorry I missed this file. Changed to context.Background(). And all require.NoError(t, wait.XXX) were changed to require.Eventually.

Comment on lines 237 to 241
err := wait.PollImmediate(10*time.Millisecond, 100*time.Millisecond, func() (bool, error) {
return ruleHasBeenDeleted(), nil
})
err := wait.PollUntilContextTimeout(context.Background(), 10*time.Millisecond, 100*time.Millisecond, true,
func(ctx context.Context) (bool, error) {
return ruleHasBeenDeleted(), nil
})
require.Error(t, err, "Rule ID was unexpectedly released")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment: could use require.Eventually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you didn't push the code? I still see wait.PollUntilContextTimeout + require.Error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this file, fixed. And I search globally, this file is the only one which uses wait.PollUntilContextTimeout + require.Error. Replaced this with require.Eventually.

Comment on lines +249 to 252
err = wait.PollUntilContextTimeout(context.Background(), 10*time.Millisecond, 1*time.Second, true,
func(ctx context.Context) (bool, error) {
return ruleHasBeenDeleted(), nil
})
require.NoError(t, err, "Rule ID was not released")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, I haven't checked the rest of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -199,7 +200,7 @@ func TestApplyServerCert(t *testing.T) {

if tt.selfSignedCert && tt.testRotate {
oldCertKeyContent := got.getCertificate()
err := wait.Poll(time.Second, 8*time.Second, func() (bool, error) {
err := wait.PollUntilContextTimeout(context.Background(), time.Second, 8*time.Second, false, func(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert.Eventually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -308,6 +308,10 @@ func (r *supportBundleREST) clean(ctx context.Context, bundlePath string, durati
defaultFS.Remove(bundlePath)
}

func (r *supportBundleREST) GetSingularName() string {
return "supportBundle"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am mistaken, this should be "supportbundle", not "supportBundle"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

require.NoError(t, wait.PollUntilContextTimeout(context.Background(), time.Millisecond*100, time.Second, true,
func(ctx context.Context) (done bool, err error) {
flowList, err = OfctlDumpTableFlowsWithoutName(ovsCtlClient, myTable.GetID())
require.Nil(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not from this PR, but should be require.NoError

same for other tests below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all require.Nil(t, err) to require.NoError(t, err)

@hjiajing
Copy link
Contributor Author

@hjiajing The Validate metrics in Prometheus CI job failed twice in a row, probably worth checking locally Please push new changes in a separate commit, so they can be reviewed easily. We will squash when we merge.

OK, all feature gates enabled E2E failed, I'm triaging this.

@antoninbas
Copy link
Contributor

@hjiajing that failure may be unrelated to your PR. However, the Validate metrics in Prometheus job failure is likely to be caused by your changes. After updating the K8s libraries, you usually need to regenerate the list of exposed metrics, IIRC.

@hjiajing
Copy link
Contributor Author

@hjiajing that failure may be unrelated to your PR. However, the Validate metrics in Prometheus job failure is likely to be caused by your changes. After updating the K8s libraries, you usually need to regenerate the list of exposed metrics, IIRC.

Sure, I will regenerate this doc.

@hjiajing
Copy link
Contributor Author

Doc updated.

@hjiajing hjiajing force-pushed the upgrade-k8s.io branch 4 times, most recently from ae08465 to 8bbe7b0 Compare March 18, 2024 14:27
1. Upgrade k8s libraries to v0.29.2
2. Upgrade controller-runtime to v0.16.3

Signed-off-by: hjiajing <[email protected]>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Mar 20, 2024

/test-all

@antoninbas antoninbas merged commit 2aafb2e into antrea-io:main Mar 20, 2024
48 of 55 checks passed
hongliangl added a commit to hongliangl/antrea that referenced this pull request Mar 22, 2024
Fix antrea-io#6129

In the failure tests, the following function is called to
verify whether a connection should be allowed or denied.
To verify a connection should be denied, it requires 5 seconds.

```
func probeClientIPFromPod(data *TestData, pod, container string, baseUrl string) (string, error) {
	url := fmt.Sprintf("%s/%s", baseUrl, "clientip")
	hostPort, _, err := data.runWgetCommandFromTestPodWithRetry(pod, data.testNamespace, container, url, 5)
	if err != nil {
		return "", err
	}
	host, _, err := net.SplitHostPort(hostPort)
	return host, err
}
```

Before antrea-io#5843, these e2e tests utilized the function PollImmediate
from k8s.io/apimachinery/pkg/util/wait, which immediately calls an
anonymous function including the above function. Since the timeout
is 5 seconds, and the ticker time is 1 second, and the anonymous
function runs immediately, the 5-second timeout is sufficient to
verify the denied state of a connection as mentioned above. However,
after antrea-io#5843, the function `Eventually` from github.com/stretchr/testify/assert
is used with the same parameters, which implies that the anonymous
function runs after the first ticker time, leaving 4 seconds. 4 seconds
are insufficient to verify the denied state of a connection.

To resolve the issue, the timeout should be adjusted to be more than
5 seconds.

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Mar 25, 2024
Fix antrea-io#6129

In the failure tests, the following function is called to
verify whether a connection should be allowed or denied.
To verify a connection should be denied, it requires 5 seconds.

```go
func probeClientIPFromPod(data *TestData, pod, container string, baseUrl string) (string, error) {
	url := fmt.Sprintf("%s/%s", baseUrl, "clientip")
	hostPort, _, err := data.runWgetCommandFromTestPodWithRetry(pod, data.testNamespace, container, url, 5)
	if err != nil {
		return "", err
	}
	host, _, err := net.SplitHostPort(hostPort)
	return host, err
}
```

Before antrea-io#5843, these e2e tests utilized the function `PollImmediate`
from `k8s.io/apimachinery/pkg/util/wait`, which immediately calls an
anonymous function including the above function. Since the timeout
is 5 seconds, and the ticker time is 1 second, and the anonymous
function runs immediately, the 5-second timeout is sufficient to
verify the denied state of a connection as mentioned above. However,
after antrea-io#5843, the function `Eventually` from `github.com/stretchr/testify/assert`
is used with the same parameters, which implies that the anonymous
function runs after the first ticker time, leaving 4 seconds. 4 seconds
are insufficient to verify the denied state of a connection.

To resolve the issue, `RunCommandFromPod` called in
`data.runWgetCommandFromTestPodWithRetry` is called directly in function
 `Eventually` to verify the connection state.

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Mar 25, 2024
Fix antrea-io#6129

In the failure tests, the following function is called to
verify whether a connection should be allowed or denied.
To verify a connection should be denied, it requires 5 seconds.

```go
func probeClientIPFromPod(data *TestData, pod, container string, baseUrl string) (string, error) {
	url := fmt.Sprintf("%s/%s", baseUrl, "clientip")
	hostPort, _, err := data.runWgetCommandFromTestPodWithRetry(pod, data.testNamespace, container, url, 5)
	if err != nil {
		return "", err
	}
	host, _, err := net.SplitHostPort(hostPort)
	return host, err
}
```

Before antrea-io#5843, these e2e tests utilized the function `PollImmediate`
from `k8s.io/apimachinery/pkg/util/wait`, which immediately calls an
anonymous function including the above function. Since the timeout
is 5 seconds, and the ticker time is 1 second, and the anonymous
function runs immediately, the 5-second timeout is sufficient to
verify the denied state of a connection as mentioned above. However,
after antrea-io#5843, the function `Eventually` from `github.com/stretchr/testify/assert`
is used with the same parameters, which implies that the anonymous
function runs after the first ticker time, leaving 4 seconds. 4 seconds
are insufficient to verify the denied state of a connection.

To resolve the issue, `RunCommandFromPod` called in
`data.runWgetCommandFromTestPodWithRetry` is called directly in function
 `Eventually` to verify the connection state.

Signed-off-by: Hongliang Liu <[email protected]>
antoninbas pushed a commit that referenced this pull request Mar 25, 2024
After switching from `wait.PollImmediate` to `assert.Eventually` in #5843,
the probe used to validate L7 NP enforcement was no longer correct.
We improve the validation logic so that each iteration of the condition
function in `assert.Eventually` only sends an HTTP probe once, instead
of using a probe with its own retry mechanism. This fixes the issue.

Fixes #6129

Signed-off-by: Hongliang Liu <[email protected]>
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Mar 26, 2024
We re-apply antrea-io#5703.
When updating K8s dependencies in antrea-io#5843, the change was reverted and we
went back to using an older version of opentelemetry which is affected
by a CVE. The CVE doesn't affect Antrea, the patch is meant to avoid
warnings from CVE scanners.

Co-authored-by: Antonin Bas <[email protected]>

Signed-off-by: Bin Liu <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this pull request Mar 26, 2024
A few issues were introduced by antrea-io#5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Fixes antrea-io#6149

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this pull request Mar 26, 2024
A few issues were introduced by antrea-io#5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Fixes antrea-io#6149

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this pull request Mar 26, 2024
A few issues were introduced by antrea-io#5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Fixes antrea-io#6149

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this pull request Mar 27, 2024
A few issues were introduced by #5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Additionally, the decoder is no longer injected automatically, it needs to be
instantiated by us. Otherwise the admission webhook panics.
See kubernetes-sigs/controller-runtime#2695

Fixes #6149

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this pull request Mar 27, 2024
We re-apply #5703.
When updating K8s dependencies in #5843, the change was reverted and we
went back to using an older version of opentelemetry which is affected
by a CVE. The CVE doesn't affect Antrea, the patch is meant to avoid
warnings from CVE scanners.

Signed-off-by: Bin Liu <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
Co-authored-by: Bin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants