Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust SRIOV MTU test #2527

Merged

Conversation

sebrandon1
Copy link
Member

@sebrandon1 sebrandon1 commented Oct 22, 2024

Thanks to @ramperher who pointed out that the mtu value we were looking for in the prior iteration of the #2514 SRIOV MTU test was looking in the wrong place for the MTU value.

There is a network-status annotation that (if matching the NAD) can display the mtu value. So if the existing check fails, we search through the network-status annotation to see if that JSON has the mtu value.

@dcibot
Copy link
Collaborator

dcibot commented Oct 22, 2024

pkg/autodiscover/autodiscover.go Show resolved Hide resolved
pkg/provider/pods.go Outdated Show resolved Hide resolved
@dcibot
Copy link
Collaborator

dcibot commented Oct 23, 2024

@ramperher
Copy link
Collaborator

from change #2527:

Tested with a workload using SRIOV. Even though the job passed, the test is still failing: https://www.distributed-ci.io/jobs/48058ea7-5a50-4971-a4d3-72cb37287b16/tests/414d16ad-982b-4819-8349-40aee6e2ce95?testcase=networking-network-attachment-definition-sriov-mtu

{"CompliantObjectsOut":null,"NonCompliantObjectsOut":[{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","loadbalancer-655bdff48b-9lftc"]},{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","testpmd-app-666cfc8fb5-6hfc6"]},{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","testpmd-app-666cfc8fb5-c9bhk"]},{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","trexconfig-5bff5f4997-wjds5"]}]}

Some logs from certsuite execution:

[DEBUG] [Oct 23 09:54:21.024] [checksgroup.go: 158] GROUP networking - Running beforeEach for check networking-network-attachment-definition-sriov-mtu
[DEBUG] [Oct 23 09:54:21.024] [pods.go: 415] pod: loadbalancer-655bdff48b-9lftc ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[DEBUG] [Oct 23 09:54:21.027] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net3 Type:0xc00fa96100 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.027] [pods.go: 426] pod: loadbalancer-655bdff48b-9lftc ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net3",
    "type": "sriov",
    "vlan": 410,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:21.027] [pods.go: 415] pod: testpmd-app-666cfc8fb5-6hfc6 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[DEBUG] [Oct 23 09:54:21.031] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net1 Type:0xc00fb02140 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.031] [pods.go: 426] pod: testpmd-app-666cfc8fb5-6hfc6 ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net1",
    "type": "sriov",
    "vlan": 407,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:21.031] [pods.go: 415] pod: testpmd-app-666cfc8fb5-c9bhk ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[DEBUG] [Oct 23 09:54:21.195] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net1 Type:0xc00fa03010 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.195] [pods.go: 426] pod: testpmd-app-666cfc8fb5-c9bhk ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net1",
    "type": "sriov",
    "vlan": 407,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:21.195] [pods.go: 415] pod: trexconfig-5bff5f4997-wjds5 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[DEBUG] [Oct 23 09:54:21.395] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net3 Type:0xc00fa03460 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.395] [pods.go: 426] pod: trexconfig-5bff5f4997-wjds5 ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net3",
    "type": "sriov",
    "vlan": 410,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[INFO] [Oct 23 09:54:21.395] [check.go: 270] [networking-network-attachment-definition-sriov-mtu] Running check (labels: [faredge networking-network-attachment-definition-sriov-mtu networking])
[DEBUG] [Oct 23 09:54:21.395] [pods.go: 415] pod: loadbalancer-655bdff48b-9lftc ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[DEBUG] [Oct 23 09:54:21.596] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net3 Type:0xc00fb026f0 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.596] [pods.go: 426] pod: loadbalancer-655bdff48b-9lftc ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net3",
    "type": "sriov",
    "vlan": 410,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:21.596] [pods.go: 415] pod: testpmd-app-666cfc8fb5-6hfc6 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[DEBUG] [Oct 23 09:54:21.796] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net1 Type:0xc00fa96690 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.796] [pods.go: 426] pod: testpmd-app-666cfc8fb5-6hfc6 ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net1",
    "type": "sriov",
    "vlan": 407,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:21.796] [pods.go: 415] pod: testpmd-app-666cfc8fb5-c9bhk ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[DEBUG] [Oct 23 09:54:21.995] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net1 Type:0xc00fa96ad0 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:21.995] [pods.go: 426] pod: testpmd-app-666cfc8fb5-c9bhk ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net1",
    "type": "sriov",
    "vlan": 407,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:21.995] [pods.go: 415] pod: trexconfig-5bff5f4997-wjds5 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[DEBUG] [Oct 23 09:54:22.196] [pods.go: 374] Single plugin config type found: {CniVersion:1.0.0 Name:intel-numa0-net3 Type:0xc00fa96f20 Plugins:<nil>}, type=sriov
[DEBUG] [Oct 23 09:54:22.196] [pods.go: 426] pod: trexconfig-5bff5f4997-wjds5 ns: example-cnf: NAD config: {
    "cniVersion": "1.0.0",
    "name": "intel-numa0-net3",
    "type": "sriov",
    "vlan": 410,
    "spoofchk": "off",
    "trust": "on",
    "vlanQoS": 0,
    "capabilities": {
        "mac": true
    },
    "logLevel": "info",
    "ipam": {}
}
[DEBUG] [Oct 23 09:54:22.196] [pods.go: 453] pod: loadbalancer-655bdff48b-9lftc ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[ERROR] [Oct 23 09:54:22.196] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: loadbalancer-655bdff48b-9lftc ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net3 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 23 09:54:22.196] [pods.go: 453] pod: testpmd-app-666cfc8fb5-6hfc6 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[ERROR] [Oct 23 09:54:22.196] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: testpmd-app-666cfc8fb5-6hfc6 ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net1 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 23 09:54:22.196] [pods.go: 453] pod: testpmd-app-666cfc8fb5-c9bhk ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[ERROR] [Oct 23 09:54:22.196] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: testpmd-app-666cfc8fb5-c9bhk ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net1 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 23 09:54:22.196] [pods.go: 453] pod: trexconfig-5bff5f4997-wjds5 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[ERROR] [Oct 23 09:54:22.196] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: trexconfig-5bff5f4997-wjds5 ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net3 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 23 09:54:22.196] [checksgroup.go: 182] GROUP networking - Running afterEach for check networking-network-attachment-definition-sriov-mtu

@ramperher
Copy link
Collaborator

Also, some more info that may be interesting for you @sebrandon1 . I think that, starting in OCP 4.14, you can see some details in the pod network annotations regarding SRIOV config, because here, in the k8s.v1.cni.cncf.io/network-status annotation, I can see the mtu attribute is defined for the networks attached to virtual functions:

Annotations:         cpu-load-balancing.crio.io: disable
                     irq-load-balancing.crio.io: disable
                     k8s.ovn.org/pod-networks:
                       {"default":{"ip_addresses":["10.131.0.65/23","fd02:0:0:4::41/64"],"mac_address":"0a:58:0a:83:00:41","gateway_ips":["10.131.0.1","fd02:0:0:...
                     k8s.v1.cni.cncf.io/network-status:
                       [{
                           "name": "ovn-kubernetes",
                           "interface": "eth0",
                           "ips": [
                               "10.131.0.65",
                               "fd02:0:0:4::41"
                           ],
                           "mac": "0a:58:0a:83:00:41",
                           "default": true,
                           "dns": {}
                       },{
                           "name": "example-cnf/intel-numa0-net3",
                           "interface": "net1",
                           "mac": "40:04:0f:f1:89:01",
                           "mtu": 9000,
                           "dns": {},
                           "device-info": {
                               "type": "pci",
                               "version": "1.1.0",
                               "pci": {
                                   "pci-address": "0000:37:0a.0"
                               }
                           }
                       },{
                           "name": "example-cnf/intel-numa0-net4",
                           "interface": "net2",
                           "mac": "40:04:0f:f1:89:02",
                           "mtu": 9000,
                           "dns": {},
                           "device-info": {
                               "type": "pci",
                               "version": "1.1.0",
                               "pci": {
                                   "pci-address": "0000:37:0b.6"
                               }
                           }
                       },{
                           "name": "example-cnf/intel-numa0-net1",
                           "interface": "net3",
                           "mac": "60:04:0f:f1:89:01",
                           "mtu": 9000,
                           "dns": {},
                           "device-info": {
                               "type": "pci",
                               "version": "1.1.0",
                               "pci": {
                                   "pci-address": "0000:37:02.6"
                               }
                           }
                       },{
                           "name": "example-cnf/intel-numa0-net2",
                           "interface": "net4",
                           "mac": "60:04:0f:f1:89:02",
                           "mtu": 9000,
                           "dns": {},
                           "device-info": {
                               "type": "pci",
                               "version": "1.1.0",
                               "pci": {
                                   "pci-address": "0000:37:03.6"
                               }
                           }
                       }]
                     k8s.v1.cni.cncf.io/networks:
                       [ { "name": "intel-numa0-net3", "mac": "40:04:0f:f1:89:01", "namespace": "example-cnf" },          { "name": "intel-numa0-net4", "mac": "4...
                     openshift.io/scc: privileged

Maybe the code you need here would be simpler, something like:

  • Retrieve the NADs attached to the pod (you're already doing this)
  • Check k8s.v1.cni.cncf.io/network-status pod annotation and iterate over the networks that are present
  • If the network name belongs to any NAD detected before and mtu is defined, then it's fine
    • In particular, if there's any network belonging to any NAD and not defining the mtu, then the test has to fail

What do you think?

@sebrandon1 sebrandon1 force-pushed the adjust_nad_test branch 2 times, most recently from 8b680c9 to f836639 Compare October 23, 2024 19:37
@dcibot
Copy link
Collaborator

dcibot commented Oct 23, 2024

@ramperher
Copy link
Collaborator

from change #2527:

The result is now correct in a setup that is not using SRIOV, let me check in a cluster with a workload consuming from SRIOV resources.

@dcibot
Copy link
Collaborator

dcibot commented Oct 24, 2024

@ramperher
Copy link
Collaborator

from change #2527:

Our example-cnf workload is still failing the MTU test:

{"CompliantObjectsOut":null,"NonCompliantObjectsOut":[{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","loadbalancer-f5697b557-w98qv"]},{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","testpmd-app-59b59d9c7c-kb9n9"]},{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","testpmd-app-59b59d9c7c-xjbfb"]},{"ObjectType":"Pod","ObjectFieldsKeys":["Reason For Non Compliance","Namespace","Pod Name"],"ObjectFieldsValues":["Failed to check if pod uses SRIOV with MTU","example-cnf","trexconfig-746b4f886f-v4n67"]}]}

The error is the same, it looks like it's not able to read the JSON input:

[DEBUG] [Oct 24 10:25:34.378] [pods.go: 453] pod: loadbalancer-f5697b557-w98qv ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[ERROR] [Oct 24 10:25:34.378] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: loadbalancer-f5697b557-w98qv ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net3 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 24 10:25:34.378] [pods.go: 453] pod: testpmd-app-59b59d9c7c-kb9n9 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[ERROR] [Oct 24 10:25:34.378] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: testpmd-app-59b59d9c7c-kb9n9 ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net1 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 24 10:25:34.378] [pods.go: 453] pod: testpmd-app-59b59d9c7c-xjbfb ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net1"
[ERROR] [Oct 24 10:25:34.378] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: testpmd-app-59b59d9c7c-xjbfb ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net1 is sriov: failed to unmarshal cni config : unexpected end of JSON input
[DEBUG] [Oct 24 10:25:34.378] [pods.go: 453] pod: trexconfig-746b4f886f-v4n67 ns: example-cnf: Reviewing network-attachment definition "intel-numa0-net3"
[ERROR] [Oct 24 10:25:34.378] [suite.go: 433] [networking-network-attachment-definition-sriov-mtu] Failed to check if pod "pod: trexconfig-746b4f886f-v4n67 ns: example-cnf" uses SRIOV with MTU, err: failed to know if network-attachment intel-numa0-net3 is sriov: failed to unmarshal cni config : unexpected end of JSON input

pkg/autodiscover/autodiscover.go Show resolved Hide resolved
pkg/provider/pods.go Outdated Show resolved Hide resolved
pkg/provider/pods.go Outdated Show resolved Hide resolved
@sebrandon1
Copy link
Member Author

I'll adjust to look at the network-status like you suggested. That seems to be the easiest way to detect the MTU.

@ramperher
Copy link
Collaborator

I'll adjust to look at the network-status like you suggested. That seems to be the easiest way to detect the MTU.

Cool, feel free to reach me when you want to test it. I think the most suitable way of executing the test would be 1) check if the network is SRIOV type, as you're doing right now, and 2) if it's not SRIOV type, then this is skipped, but if it's SRIOV type, then check the network annotation of the pod, find that network, and see if mtu is defined there. It should work in that way.

@dcibot
Copy link
Collaborator

dcibot commented Oct 25, 2024

@dcibot
Copy link
Collaborator

dcibot commented Oct 25, 2024

@sebrandon1
Copy link
Member Author

@ramperher Okay I pushed a new change.

The logic goes something like:

  1. First check if the pod "k8s.v1.cni.cncf.io/networks" network has a spec.Config that contains MTU.
  2. If that fails, then check the "k8s.v1.cni.cncf.io/network-status" annotation and see if the NAD matches the network name and MTU is set.

I think I covered those scenarios in the unit tests by spoofing the client and using fake objects.

@dcibot
Copy link
Collaborator

dcibot commented Oct 25, 2024

@dcibot
Copy link
Collaborator

dcibot commented Oct 25, 2024

@sebrandon1
Copy link
Member Author

/dci-rerun

@dcibot
Copy link
Collaborator

dcibot commented Oct 25, 2024

@dcibot
Copy link
Collaborator

dcibot commented Nov 11, 2024

Copy link
Contributor

@greyerof greyerof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my comment regarding the mtu check, I left some suggestions to remove the nolint:gocritc comments.

pkg/provider/pods.go Outdated Show resolved Hide resolved
pkg/autodiscover/autodiscover_resources.go Outdated Show resolved Hide resolved
pkg/autodiscover/autodiscover.go Outdated Show resolved Hide resolved
pkg/provider/pods.go Show resolved Hide resolved
@sebrandon1 sebrandon1 force-pushed the adjust_nad_test branch 2 times, most recently from 61ba728 to 9894033 Compare November 14, 2024 20:57
@dcibot
Copy link
Collaborator

dcibot commented Nov 14, 2024

@dcibot
Copy link
Collaborator

dcibot commented Nov 14, 2024

pkg/provider/pods.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@ramperher ramperher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-validated the new refactor and it's working as expected, LGTM

@sebrandon1 sebrandon1 force-pushed the adjust_nad_test branch 2 times, most recently from 14b6927 to 2fb3602 Compare November 25, 2024 21:31
@dcibot
Copy link
Collaborator

dcibot commented Nov 25, 2024

@ramperher
Copy link
Collaborator

from change dci-labs/dallas-pipelines#1260:

Just checked and confirmed that the test is still working fine and it is

@dcibot
Copy link
Collaborator

dcibot commented Nov 26, 2024

@sebrandon1 sebrandon1 merged commit 6f95071 into redhat-best-practices-for-k8s:main Nov 26, 2024
27 checks passed
@sebrandon1 sebrandon1 deleted the adjust_nad_test branch November 26, 2024 20:57
@dcibot
Copy link
Collaborator

dcibot commented Nov 26, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants