-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestL7NetworkPolicy e2e tests failing consistently #6129
Comments
@tnqn the only recent related change I can think of is the logging fix, but that should not have impacted NP enforcement |
If it started two days ago, #4537 may be related. The current log doesn't provide any infromation as all agents were restarted before collecting the logs. We need to invert the order to know what happened. @hongliangl could you check this one? |
Will do |
Fix antrea-io#6129 In the failure tests, the following function is called to verify whether a connection should be allowed or denied. To verify a connection should be denied, it requires 5 seconds. ``` func probeClientIPFromPod(data *TestData, pod, container string, baseUrl string) (string, error) { url := fmt.Sprintf("%s/%s", baseUrl, "clientip") hostPort, _, err := data.runWgetCommandFromTestPodWithRetry(pod, data.testNamespace, container, url, 5) if err != nil { return "", err } host, _, err := net.SplitHostPort(hostPort) return host, err } ``` Before antrea-io#5843, these e2e tests utilized the function PollImmediate from k8s.io/apimachinery/pkg/util/wait, which immediately calls an anonymous function including the above function. Since the timeout is 5 seconds, and the ticker time is 1 second, and the anonymous function runs immediately, the 5-second timeout is sufficient to verify the denied state of a connection as mentioned above. However, after antrea-io#5843, the function `Eventually` from github.com/stretchr/testify/assert is used with the same parameters, which implies that the anonymous function runs after the first ticker time, leaving 4 seconds. 4 seconds are insufficient to verify the denied state of a connection. To resolve the issue, the timeout should be adjusted to be more than 5 seconds. Signed-off-by: Hongliang Liu <[email protected]>
Fix antrea-io#6129 In the failure tests, the following function is called to verify whether a connection should be allowed or denied. To verify a connection should be denied, it requires 5 seconds. ```go func probeClientIPFromPod(data *TestData, pod, container string, baseUrl string) (string, error) { url := fmt.Sprintf("%s/%s", baseUrl, "clientip") hostPort, _, err := data.runWgetCommandFromTestPodWithRetry(pod, data.testNamespace, container, url, 5) if err != nil { return "", err } host, _, err := net.SplitHostPort(hostPort) return host, err } ``` Before antrea-io#5843, these e2e tests utilized the function `PollImmediate` from `k8s.io/apimachinery/pkg/util/wait`, which immediately calls an anonymous function including the above function. Since the timeout is 5 seconds, and the ticker time is 1 second, and the anonymous function runs immediately, the 5-second timeout is sufficient to verify the denied state of a connection as mentioned above. However, after antrea-io#5843, the function `Eventually` from `github.com/stretchr/testify/assert` is used with the same parameters, which implies that the anonymous function runs after the first ticker time, leaving 4 seconds. 4 seconds are insufficient to verify the denied state of a connection. To resolve the issue, `RunCommandFromPod` called in `data.runWgetCommandFromTestPodWithRetry` is called directly in function `Eventually` to verify the connection state. Signed-off-by: Hongliang Liu <[email protected]>
Fix antrea-io#6129 In the failure tests, the following function is called to verify whether a connection should be allowed or denied. To verify a connection should be denied, it requires 5 seconds. ```go func probeClientIPFromPod(data *TestData, pod, container string, baseUrl string) (string, error) { url := fmt.Sprintf("%s/%s", baseUrl, "clientip") hostPort, _, err := data.runWgetCommandFromTestPodWithRetry(pod, data.testNamespace, container, url, 5) if err != nil { return "", err } host, _, err := net.SplitHostPort(hostPort) return host, err } ``` Before antrea-io#5843, these e2e tests utilized the function `PollImmediate` from `k8s.io/apimachinery/pkg/util/wait`, which immediately calls an anonymous function including the above function. Since the timeout is 5 seconds, and the ticker time is 1 second, and the anonymous function runs immediately, the 5-second timeout is sufficient to verify the denied state of a connection as mentioned above. However, after antrea-io#5843, the function `Eventually` from `github.com/stretchr/testify/assert` is used with the same parameters, which implies that the anonymous function runs after the first ticker time, leaving 4 seconds. 4 seconds are insufficient to verify the denied state of a connection. To resolve the issue, `RunCommandFromPod` called in `data.runWgetCommandFromTestPodWithRetry` is called directly in function `Eventually` to verify the connection state. Signed-off-by: Hongliang Liu <[email protected]>
After switching from `wait.PollImmediate` to `assert.Eventually` in #5843, the probe used to validate L7 NP enforcement was no longer correct. We improve the validation logic so that each iteration of the condition function in `assert.Eventually` only sends an HTTP probe once, instead of using a probe with its own retry mechanism. This fixes the issue. Fixes #6129 Signed-off-by: Hongliang Liu <[email protected]>
Describe the bug
I have seen the
TestL7NetworkPolicy
e2e tests fail consistently for a few recent PRs in Github CI. All the subtests are marked as failed, not a specific one.Example of a failed test: https://github.com/antrea-io/antrea/actions/runs/8365150919/job/22902786915
Expand the section below for the full output.
Full test output
The text was updated successfully, but these errors were encountered: