Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRIOV ExcludeTopology tests #1557

Merged
merged 1 commit into from
Jul 27, 2023

Conversation

zeeke
Copy link
Member

@zeeke zeeke commented Jul 12, 2023

This PR contains a version bump for the sriov-network-operator to use the new API field SriovNetworkNodePolicy.Spec.ExcludeTopology.

Tests leverage two devices on different numa nodes and guaranteed pods.

cc @SchSeba , @gregkopels

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 12, 2023
@zeeke zeeke force-pushed the numa-sriov-tests branch from b93288b to 3691a76 Compare July 13, 2023 10:45
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2023
@zeeke zeeke force-pushed the numa-sriov-tests branch 3 times, most recently from afd9c03 to 302fc3b Compare July 13, 2023 11:17
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2023
@zeeke zeeke changed the title [WIP] SRIOV ExcludeTopology tests SRIOV ExcludeTopology tests Jul 13, 2023
@zeeke zeeke marked this pull request as ready for review July 13, 2023 11:17
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023
@openshift-ci openshift-ci bot requested review from ijolliffe and imiller0 July 13, 2023 11:17
}

if previousPerfProfile != nil {
OriginalPerformanceProfile = previousPerfProfile.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to use a map or something instead of insisting on the global variable? The two suites are running separately, so in theory it won't happen, but it's not future proof and somebody can choose to call both this and FindOrOverridePerformanceProfile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What would you use for the map key? previous performance profile name?

I guess this is not going to work in any case when tests run in parallel, as the affected nodes could collide and lead to an unexpected cluster state.

Maybe we can get rid of this save/restore mechanism and make tests clear and create PerformanceProfile. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, IIRC in some scenarios the performance profile is created upfront to save some running time.

And my concern is not about running tests in parallel, but this not being able to restore the right one. Until now there was only a test that was doing that in an isolated manner, so the global was serving that purpouse only. Now we are using it for two different scenarios, and looking at the API that is not clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative here is to check if OriginalPerformanceProfile is filled already and panic in that case, because that would mean we are re-replacing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to return the previous perf profile and set the global/map/whatever in the calling site? this will at least make the flow more explicit instead of buried inside this function, which is not great when we mutate the global state.

AfterAll(func() {
By("Cleaning performance profiles")

err := performanceprofile.CleanPerformanceProfiles()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we restore the saved perf profile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I had to fix the RestorePerformanceProfile() to handle the case OriginalPerformanceProfile == nil

})
})

func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These (and below) are worth being in some pkg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below functions rely on test-specific default values. Moving them to a generic package would make them much less readable. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one I see test specific is createSriovNetworkAndPolicy.
Also, we already have functions to create sriov networks and sriov policies. I'd try to extend / modify those instead of having another variant scattered across the repo.

@zeeke zeeke force-pushed the numa-sriov-tests branch 2 times, most recently from 830b849 to d143efe Compare July 13, 2023 12:22
networks.WaitStable(sriovclient)
})

AfterAll(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BeforeAll, a namespace is created. Should it not be deleted here?

If CI runs test suites in parallel, the same namespace may already exist in which case either
a) we don't want to delete it here at the cost of leaving it as a leftover or
b) we create its own namespace and so we can freely delete it here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The problem here is that "sriov-conformance-testing" namespace is a little bit special:

  • It must exist to make the DiscoverSriov() function work. (es s2i.go#L163 ). We need to clean that in the upstream repository.

  • It's dumped by the k8sreporter (see pkg/utils/reporter.go#L97), though it wouldn't be a problem to an entry like "numa-dpdk-tests-ns": "sriov"

  • it's cleaned by the Fixture mechanism (see pkg/features/features.go#L103), as cleaning it in the AfterAll/AfterEach makes the k8sreporter not to gather it.

None of the above points represents a hard limit for using the own namespace, but they require a reasonable effort.

@zeeke zeeke force-pushed the numa-sriov-tests branch from d143efe to 6b0039c Compare July 13, 2023 13:50
actualPod, err := client.Client.Pods(sriovnamespaces.Test).Get(context.Background(), pod.Name, metav1.GetOptions{})
g.Expect(err).ToNot(HaveOccurred())
g.Expect(actualPod.Status.Phase).To(Equal(corev1.PodFailed))
g.Expect(actualPod.Status.Message).To(ContainSubstring("Resources cannot be allocated with Topology locality"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonblocking: I think checking Status.Reason (rather than the message) is a bit simpler and cleaner. I reckon the way we report errors in kube is not great in general.


func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {
for _, device := range devices {
out, err := nodes.ExecCommandOnNode([]string{"cat", fmt.Sprintf("/sys/class/net/%s/device/numa_node", device.Name)}, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonblocking: being biten enough time already to suggest using filepath.Clean(filepath.Join(....)).

@zeeke zeeke force-pushed the numa-sriov-tests branch 2 times, most recently from 5fb61dc to b33e7e9 Compare July 25, 2023 10:14
@zeeke
Copy link
Member Author

zeeke commented Jul 25, 2023

I put the creation of the performance profile out of this PR's scope, so the suite expects the cluster to be configured with a single-numa-node profile.

Creating it dynamic can break the cluster, and I prefer to tackle that problem in a subsequent PR.

@fedepaol , @cgoncalves , @ffromani , @gregkopels PTAL

})
})

func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one I see test specific is createSriovNetworkAndPolicy.
Also, we already have functions to create sriov networks and sriov policies. I'd try to extend / modify those instead of having another variant scattered across the repo.

}
}

func createSriovNetworkAndPolicy(opts ...func(*sriovv1.SriovNetworkNodePolicy, *sriovv1.SriovNetwork)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like of this approach is, the modifier takes both the policy and the network.
We can split this in two (in the calling site) and have a more focused modifier for each side.


createSriovNetworkAndPolicy(
withNodeSelector(testingNode),
withNumVFs(8), withPfNameSelector(numa0Device.Name+"#0-3"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably borderline nitpicking but... how do we know the SRIOV PF supports at least 8 VF? I guess OCP doesn't support at all hardware not powerful enough to provide at least 8 VFs, and we can depend on that, right?

Copy link
Member Author

@zeeke zeeke Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})
})

func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works. The possible issue I see is this causes quite a lot of ExecCommandOnNode which is not too cheap. Could it be worth to discover all the relevant devices (or at least the PFs - I can't imagine a device which does or a setup which wants a VF on a different NUMA node than the one the PF is attached to) and report their NUMA affinity, than make logic on the test side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "report their NUMA affinity"? Wouldn't it involve a call to ExecCommandOnNode for each PF?

We can improve this by indexing every device by its NUMA node, but it wouldn't make that difference: suppose we have 4 devices splitter in NUMA nodes 0 and 1

device NUMA node
ens1f0 0
ens1f1 0
ens3f0 1
ens3f1 1

With current implementation:

  • findDeviceOnNUMANode(..., 0) calls 1 ExecCommandOnNode and return ens1f0
  • findDeviceOnNUMANode(..., 1) calls 3 ExecCommandOnNode and return ens3f0

For a total of 4 calls, the same as we would have by looping all the devices for indexing

Furthermore, these tests spend a lot of time waiting minutes for SR-IOV devices to get configured. I feel like we are trying to optimize a very small piece of the whole puzzle.

Copy link
Member

@ffromani ffromani Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not worried by run execution time, but we had (in PAO tests) quite a fair issues with ExecCommandOnNode calls being fragile, leading to flaky tests.
But this is no biggie, we can evaluate later

}

if previousPerfProfile != nil {
OriginalPerformanceProfile = previousPerfProfile.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to return the previous perf profile and set the global/map/whatever in the calling site? this will at least make the flow more explicit instead of buried inside this function, which is not great when we mutate the global state.

@zeeke zeeke force-pushed the numa-sriov-tests branch 4 times, most recently from b11131f to 5e6d86a Compare July 26, 2023 12:38
Test cases uses a set SriovNetworkNodePolicies that targets at least
two NIC, placed on two different NUMA nodes. Playing with the
`excludeTopology` field, is it possible to create workload pod that
uses multiple or a single NUMA node.

Signed-off-by: Andrea Panattoni <[email protected]>
@zeeke zeeke force-pushed the numa-sriov-tests branch from 5e6d86a to ba6dad2 Compare July 26, 2023 12:44
@zeeke
Copy link
Member Author

zeeke commented Jul 26, 2023

/retest

@zeeke
Copy link
Member Author

zeeke commented Jul 27, 2023

ci/prow/e2e-gcp-ovn test case

[rfe_id:27368][performance] Network latency parameters adjusted by the Node Tuning Operator [test_id:28467][crit:high][vendor:[email protected]][level:acceptance] Should contain configuration injected through the openshift-node-performance profile

is failing in other PRs, hence I assume it's not related to these changes.


It("Validate the creation of a pod with excludeTopology set to False and an SRIOV interface in a different NUMA node than the pod", func() {
pod := pods.DefinePod(sriovnamespaces.Test)
pods.RedefineWithGuaranteedQoS(pod, "1", "100m")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you use the same pod = ... pattern here? Or, is there a reason why it was not done?

pods.RedefineWithGuaranteedQoS(pod, "1", "100m")
pod = pods.RedefinePodWithNetwork(pod, "test-numa-0-exclude-topology-false")

pod, err := client.Client.Pods(sriovnamespaces.Test).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we differentiate here between pod - the template for creation and pod - the created pod?

namespaces.CleanPods(sriovnamespaces.Test, sriovclient)
})

It("Validate the creation of a pod with excludeTopology set to False and an SRIOV interface in a different NUMA node than the pod", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: validate the creation ... fails

@fedepaol
Copy link
Member

left a few non blocking nits, to change only if you need to touch the codebase anyway.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2023
@ffromani
Copy link
Member

/lgtm

there area still some areas on which we can improve/generalize, but not blocking for this work.

@fedepaol
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fedepaol, zeeke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fedepaol
Copy link
Member

/override ci/prow/e2e-gcp-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2023

@fedepaol: Overrode contexts on behalf of fedepaol: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants