SRIOV ExcludeTopology tests #1557

zeeke · 2023-07-12T10:28:57Z

This PR contains a version bump for the sriov-network-operator to use the new API field SriovNetworkNodePolicy.Spec.ExcludeTopology.

Tests leverage two devices on different numa nodes and guaranteed pods.

cc @SchSeba , @gregkopels

openshift-ci · 2023-07-12T10:29:01Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

fedepaol · 2023-07-13T11:54:17Z

cnf-tests/testsuites/pkg/performanceprofile/performanceprofile.go

+	}
+
+	if previousPerfProfile != nil {
+		OriginalPerformanceProfile = previousPerfProfile.DeepCopy()


would it make sense to use a map or something instead of insisting on the global variable? The two suites are running separately, so in theory it won't happen, but it's not future proof and somebody can choose to call both this and FindOrOverridePerformanceProfile

Good point. What would you use for the map key? previous performance profile name?

I guess this is not going to work in any case when tests run in parallel, as the affected nodes could collide and lead to an unexpected cluster state.

Maybe we can get rid of this save/restore mechanism and make tests clear and create PerformanceProfile. WDYT?

The problem is, IIRC in some scenarios the performance profile is created upfront to save some running time.

And my concern is not about running tests in parallel, but this not being able to restore the right one. Until now there was only a test that was doing that in an isolated manner, so the global was serving that purpouse only. Now we are using it for two different scenarios, and looking at the API that is not clear.

An alternative here is to check if OriginalPerformanceProfile is filled already and panic in that case, because that would mean we are re-replacing

would it make sense to return the previous perf profile and set the global/map/whatever in the calling site? this will at least make the flow more explicit instead of buried inside this function, which is not great when we mutate the global state.

cnf-tests/testsuites/pkg/performanceprofile/performanceprofile.go

fedepaol · 2023-07-13T11:59:31Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+	AfterAll(func() {
+		By("Cleaning performance profiles")
+
+		err := performanceprofile.CleanPerformanceProfiles()


Should we restore the saved perf profile?

Right, I had to fix the RestorePerformanceProfile() to handle the case OriginalPerformanceProfile == nil

fedepaol · 2023-07-13T12:01:55Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+	})
+})
+
+func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {


These (and below) are worth being in some pkg

The below functions rely on test-specific default values. Moving them to a generic package would make them much less readable. WDYT?

The only one I see test specific is createSriovNetworkAndPolicy.
Also, we already have functions to create sriov networks and sriov policies. I'd try to extend / modify those instead of having another variant scattered across the repo.

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

cgoncalves · 2023-07-13T12:38:26Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+		networks.WaitStable(sriovclient)
+	})
+
+	AfterAll(func() {


In BeforeAll, a namespace is created. Should it not be deleted here?

If CI runs test suites in parallel, the same namespace may already exist in which case either
a) we don't want to delete it here at the cost of leaving it as a leftover or
b) we create its own namespace and so we can freely delete it here

Good point. The problem here is that "sriov-conformance-testing" namespace is a little bit special:

It must exist to make the DiscoverSriov() function work. (es s2i.go#L163 ). We need to clean that in the upstream repository.

It's dumped by the k8sreporter (see pkg/utils/reporter.go#L97), though it wouldn't be a problem to an entry like "numa-dpdk-tests-ns": "sriov"

it's cleaned by the Fixture mechanism (see pkg/features/features.go#L103), as cleaning it in the AfterAll/AfterEach makes the k8sreporter not to gather it.

None of the above points represents a hard limit for using the own namespace, but they require a reasonable effort.

ffromani · 2023-07-14T08:24:21Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+			actualPod, err := client.Client.Pods(sriovnamespaces.Test).Get(context.Background(), pod.Name, metav1.GetOptions{})
+			g.Expect(err).ToNot(HaveOccurred())
+			g.Expect(actualPod.Status.Phase).To(Equal(corev1.PodFailed))
+			g.Expect(actualPod.Status.Message).To(ContainSubstring("Resources cannot be allocated with Topology locality"))


nonblocking: I think checking Status.Reason (rather than the message) is a bit simpler and cleaner. I reckon the way we report errors in kube is not great in general.

ffromani · 2023-07-14T08:25:40Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+
+func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {
+	for _, device := range devices {
+		out, err := nodes.ExecCommandOnNode([]string{"cat", fmt.Sprintf("/sys/class/net/%s/device/numa_node", device.Name)}, node)


nonblocking: being biten enough time already to suggest using filepath.Clean(filepath.Join(....)).

zeeke · 2023-07-25T10:14:13Z

I put the creation of the performance profile out of this PR's scope, so the suite expects the cluster to be configured with a single-numa-node profile.

Creating it dynamic can break the cluster, and I prefer to tackle that problem in a subsequent PR.

@fedepaol , @cgoncalves , @ffromani , @gregkopels PTAL

fedepaol · 2023-07-26T08:48:38Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+	})
+})
+
+func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {


The only one I see test specific is createSriovNetworkAndPolicy.
Also, we already have functions to create sriov networks and sriov policies. I'd try to extend / modify those instead of having another variant scattered across the repo.

fedepaol · 2023-07-26T08:50:04Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+	}
+}
+
+func createSriovNetworkAndPolicy(opts ...func(*sriovv1.SriovNetworkNodePolicy, *sriovv1.SriovNetwork)) {


What I don't like of this approach is, the modifier takes both the policy and the network.
We can split this in two (in the calling site) and have a more focused modifier for each side.

ffromani · 2023-07-26T09:59:40Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+
+		createSriovNetworkAndPolicy(
+			withNodeSelector(testingNode),
+			withNumVFs(8), withPfNameSelector(numa0Device.Name+"#0-3"),


this is probably borderline nitpicking but... how do we know the SRIOV PF supports at least 8 VF? I guess OCP doesn't support at all hardware not powerful enough to provide at least 8 VFs, and we can depend on that, right?

I had a quick look at supported hardware [1] datasheets, and we can safely assume a NIC supports at least 128 VFs [2] [3] [4] [5] [6] [7] [8], though I didn't find how many VFs Pensando NICs actually supports [9].

[1] https://docs.openshift.com/container-platform/4.13/networking/hardware_networks/about-sriov.html
[2] https://docs.broadcom.com/doc/957414A4142CC-DS
[3] https://docs.broadcom.com/doc/957508-P2100G-DS
[4] https://cdrdv2-public.intel.com/332464/332464_710_Series_Datasheet_v_4_1.pdf
[5] file:///home/apanatto/Downloads/332464_710_Series_Datasheet_v_4_1.pdf
[6] https://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/intel-e810-cqda2-ethernet-network-adapter-product-brief.pdf
[7] https://network.nvidia.com/files/doc-2020/pb-connectx-4-lx-en-card.pdf
[8] https://network.nvidia.com/files/doc-2020/pb-connectx-5-en-card.pdf
[9] https://bugzilla.redhat.com/show_bug.cgi?id=2029824

ffromani · 2023-07-26T10:04:43Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+	})
+})
+
+func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {


this works. The possible issue I see is this causes quite a lot of ExecCommandOnNode which is not too cheap. Could it be worth to discover all the relevant devices (or at least the PFs - I can't imagine a device which does or a setup which wants a VF on a different NUMA node than the one the PF is attached to) and report their NUMA affinity, than make logic on the test side?

What do you mean by "report their NUMA affinity"? Wouldn't it involve a call to ExecCommandOnNode for each PF?

We can improve this by indexing every device by its NUMA node, but it wouldn't make that difference: suppose we have 4 devices splitter in NUMA nodes 0 and 1

device NUMA node

ens1f0 0

ens1f1 0

ens3f0 1

ens3f1 1

With current implementation:

findDeviceOnNUMANode(..., 0) calls 1 ExecCommandOnNode and return ens1f0

findDeviceOnNUMANode(..., 1) calls 3 ExecCommandOnNode and return ens3f0

For a total of 4 calls, the same as we would have by looping all the devices for indexing

Furthermore, these tests spend a lot of time waiting minutes for SR-IOV devices to get configured. I feel like we are trying to optimize a very small piece of the whole puzzle.

I'm not worried by run execution time, but we had (in PAO tests) quite a fair issues with ExecCommandOnNode calls being fragile, leading to flaky tests.
But this is no biggie, we can evaluate later

ffromani · 2023-07-26T10:06:37Z

cnf-tests/testsuites/pkg/performanceprofile/performanceprofile.go

+	}
+
+	if previousPerfProfile != nil {
+		OriginalPerformanceProfile = previousPerfProfile.DeepCopy()


would it make sense to return the previous perf profile and set the global/map/whatever in the calling site? this will at least make the flow more explicit instead of buried inside this function, which is not great when we mutate the global state.

Test cases uses a set SriovNetworkNodePolicies that targets at least two NIC, placed on two different NUMA nodes. Playing with the `excludeTopology` field, is it possible to create workload pod that uses multiple or a single NUMA node. Signed-off-by: Andrea Panattoni <[email protected]>

zeeke · 2023-07-26T14:54:40Z

/retest

zeeke · 2023-07-27T08:31:56Z

ci/prow/e2e-gcp-ovn test case

[rfe_id:27368][performance] Network latency parameters adjusted by the Node Tuning Operator [test_id:28467][crit:high][vendor:[email protected]][level:acceptance] Should contain configuration injected through the openshift-node-performance profile

is failing in other PRs, hence I assume it's not related to these changes.

fedepaol · 2023-07-27T09:00:42Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+
+	It("Validate the creation of a pod with excludeTopology set to False and an SRIOV interface in a different NUMA node than the pod", func() {
+		pod := pods.DefinePod(sriovnamespaces.Test)
+		pods.RedefineWithGuaranteedQoS(pod, "1", "100m")


nit: can you use the same pod = ... pattern here? Or, is there a reason why it was not done?

fedepaol · 2023-07-27T09:01:30Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+		pods.RedefineWithGuaranteedQoS(pod, "1", "100m")
+		pod = pods.RedefinePodWithNetwork(pod, "test-numa-0-exclude-topology-false")
+
+		pod, err := client.Client.Pods(sriovnamespaces.Test).


can we differentiate here between pod - the template for creation and pod - the created pod?

fedepaol · 2023-07-27T09:03:38Z

cnf-tests/testsuites/e2esuite/dpdk/numa_node_sriov.go

+		namespaces.CleanPods(sriovnamespaces.Test, sriovclient)
+	})
+
+	It("Validate the creation of a pod with excludeTopology set to False and an SRIOV interface in a different NUMA node than the pod", func() {


nit: validate the creation ... fails

fedepaol · 2023-07-27T09:04:16Z

left a few non blocking nits, to change only if you need to touch the codebase anyway.

/lgtm

ffromani · 2023-07-27T09:21:25Z

/lgtm

there area still some areas on which we can improve/generalize, but not blocking for this work.

fedepaol · 2023-07-27T09:25:51Z

/approve

openshift-ci · 2023-07-27T09:25:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fedepaol, zeeke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fedepaol,zeeke]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fedepaol · 2023-07-27T09:26:00Z

/override ci/prow/e2e-gcp-ovn

openshift-ci · 2023-07-27T09:26:05Z

@fedepaol: Overrode contexts on behalf of fedepaol: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 12, 2023

zeeke force-pushed the numa-sriov-tests branch from b93288b to 3691a76 Compare July 13, 2023 10:45

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2023

zeeke force-pushed the numa-sriov-tests branch 3 times, most recently from afd9c03 to 302fc3b Compare July 13, 2023 11:17

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2023

zeeke changed the title ~~[WIP] SRIOV ExcludeTopology tests~~ SRIOV ExcludeTopology tests Jul 13, 2023

zeeke marked this pull request as ready for review July 13, 2023 11:17

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023

openshift-ci bot requested review from ijolliffe and imiller0 July 13, 2023 11:17

fedepaol reviewed Jul 13, 2023

View reviewed changes

zeeke force-pushed the numa-sriov-tests branch 2 times, most recently from 830b849 to d143efe Compare July 13, 2023 12:22

cgoncalves reviewed Jul 13, 2023

View reviewed changes

zeeke force-pushed the numa-sriov-tests branch from d143efe to 6b0039c Compare July 13, 2023 13:50

ffromani reviewed Jul 14, 2023

View reviewed changes

zeeke force-pushed the numa-sriov-tests branch 2 times, most recently from 5fb61dc to b33e7e9 Compare July 25, 2023 10:14

fedepaol reviewed Jul 26, 2023

View reviewed changes

ffromani reviewed Jul 26, 2023

View reviewed changes

zeeke force-pushed the numa-sriov-tests branch 4 times, most recently from b11131f to 5e6d86a Compare July 26, 2023 12:38

zeeke force-pushed the numa-sriov-tests branch from 5e6d86a to ba6dad2 Compare July 26, 2023 12:44

fedepaol reviewed Jul 27, 2023

View reviewed changes

openshift-ci bot assigned fedepaol Jul 27, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2023

openshift-ci bot assigned ffromani Jul 27, 2023

openshift-merge-robot merged commit d010084 into openshift-kni:master Jul 27, 2023

zeeke mentioned this pull request Aug 31, 2023

[release-4.13] Backport of [sriov] NUMA ExcludeTopology test cases #1629

Merged

SRIOV ExcludeTopology tests #1557

SRIOV ExcludeTopology tests #1557

Conversation

zeeke commented Jul 12, 2023

openshift-ci bot commented Jul 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeeke commented Jul 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeeke Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffromani Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeeke commented Jul 26, 2023

zeeke commented Jul 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedepaol commented Jul 27, 2023

ffromani commented Jul 27, 2023

fedepaol commented Jul 27, 2023

openshift-ci bot commented Jul 27, 2023

fedepaol commented Jul 27, 2023

openshift-ci bot commented Jul 27, 2023

zeeke Jul 26, 2023 •

edited

Loading

ffromani Jul 26, 2023 •

edited

Loading