-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRIOV ExcludeTopology tests #1557
SRIOV ExcludeTopology tests #1557
Conversation
Skipping CI for Draft Pull Request. |
afd9c03
to
302fc3b
Compare
} | ||
|
||
if previousPerfProfile != nil { | ||
OriginalPerformanceProfile = previousPerfProfile.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to use a map or something instead of insisting on the global variable? The two suites are running separately, so in theory it won't happen, but it's not future proof and somebody can choose to call both this and FindOrOverridePerformanceProfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What would you use for the map key? previous performance profile name?
I guess this is not going to work in any case when tests run in parallel, as the affected nodes could collide and lead to an unexpected cluster state.
Maybe we can get rid of this save/restore mechanism and make tests clear and create PerformanceProfile. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is, IIRC in some scenarios the performance profile is created upfront to save some running time.
And my concern is not about running tests in parallel, but this not being able to restore the right one. Until now there was only a test that was doing that in an isolated manner, so the global was serving that purpouse only. Now we are using it for two different scenarios, and looking at the API that is not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative here is to check if OriginalPerformanceProfile
is filled already and panic in that case, because that would mean we are re-replacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to return the previous perf profile and set the global/map/whatever in the calling site? this will at least make the flow more explicit instead of buried inside this function, which is not great when we mutate the global state.
cnf-tests/testsuites/pkg/performanceprofile/performanceprofile.go
Outdated
Show resolved
Hide resolved
AfterAll(func() { | ||
By("Cleaning performance profiles") | ||
|
||
err := performanceprofile.CleanPerformanceProfiles() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we restore the saved perf profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I had to fix the RestorePerformanceProfile()
to handle the case OriginalPerformanceProfile == nil
}) | ||
}) | ||
|
||
func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These (and below) are worth being in some pkg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below functions rely on test-specific default values. Moving them to a generic package would make them much less readable. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one I see test specific is createSriovNetworkAndPolicy.
Also, we already have functions to create sriov networks and sriov policies. I'd try to extend / modify those instead of having another variant scattered across the repo.
830b849
to
d143efe
Compare
networks.WaitStable(sriovclient) | ||
}) | ||
|
||
AfterAll(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In BeforeAll, a namespace is created. Should it not be deleted here?
If CI runs test suites in parallel, the same namespace may already exist in which case either
a) we don't want to delete it here at the cost of leaving it as a leftover or
b) we create its own namespace and so we can freely delete it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The problem here is that "sriov-conformance-testing"
namespace is a little bit special:
-
It must exist to make the DiscoverSriov() function work. (es s2i.go#L163 ). We need to clean that in the upstream repository.
-
It's dumped by the k8sreporter (see pkg/utils/reporter.go#L97), though it wouldn't be a problem to an entry like "numa-dpdk-tests-ns": "sriov"
-
it's cleaned by the Fixture mechanism (see pkg/features/features.go#L103), as cleaning it in the AfterAll/AfterEach makes the k8sreporter not to gather it.
None of the above points represents a hard limit for using the own namespace, but they require a reasonable effort.
actualPod, err := client.Client.Pods(sriovnamespaces.Test).Get(context.Background(), pod.Name, metav1.GetOptions{}) | ||
g.Expect(err).ToNot(HaveOccurred()) | ||
g.Expect(actualPod.Status.Phase).To(Equal(corev1.PodFailed)) | ||
g.Expect(actualPod.Status.Message).To(ContainSubstring("Resources cannot be allocated with Topology locality")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonblocking: I think checking Status.Reason
(rather than the message) is a bit simpler and cleaner. I reckon the way we report errors in kube is not great in general.
|
||
func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) { | ||
for _, device := range devices { | ||
out, err := nodes.ExecCommandOnNode([]string{"cat", fmt.Sprintf("/sys/class/net/%s/device/numa_node", device.Name)}, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonblocking: being biten enough time already to suggest using filepath.Clean(filepath.Join(....))
.
5fb61dc
to
b33e7e9
Compare
I put the creation of the performance profile out of this PR's scope, so the suite expects the cluster to be configured with a single-numa-node profile. Creating it dynamic can break the cluster, and I prefer to tackle that problem in a subsequent PR. @fedepaol , @cgoncalves , @ffromani , @gregkopels PTAL |
}) | ||
}) | ||
|
||
func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one I see test specific is createSriovNetworkAndPolicy.
Also, we already have functions to create sriov networks and sriov policies. I'd try to extend / modify those instead of having another variant scattered across the repo.
} | ||
} | ||
|
||
func createSriovNetworkAndPolicy(opts ...func(*sriovv1.SriovNetworkNodePolicy, *sriovv1.SriovNetwork)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't like of this approach is, the modifier takes both the policy and the network.
We can split this in two (in the calling site) and have a more focused modifier for each side.
|
||
createSriovNetworkAndPolicy( | ||
withNodeSelector(testingNode), | ||
withNumVFs(8), withPfNameSelector(numa0Device.Name+"#0-3"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably borderline nitpicking but... how do we know the SRIOV PF supports at least 8 VF? I guess OCP doesn't support at all hardware not powerful enough to provide at least 8 VFs, and we can depend on that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look at supported hardware [1] datasheets, and we can safely assume a NIC supports at least 128 VFs [2] [3] [4] [5] [6] [7] [8], though I didn't find how many VFs Pensando NICs actually supports [9].
[1] https://docs.openshift.com/container-platform/4.13/networking/hardware_networks/about-sriov.html
[2] https://docs.broadcom.com/doc/957414A4142CC-DS
[3] https://docs.broadcom.com/doc/957508-P2100G-DS
[4] https://cdrdv2-public.intel.com/332464/332464_710_Series_Datasheet_v_4_1.pdf
[5] file:///home/apanatto/Downloads/332464_710_Series_Datasheet_v_4_1.pdf
[6] https://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/intel-e810-cqda2-ethernet-network-adapter-product-brief.pdf
[7] https://network.nvidia.com/files/doc-2020/pb-connectx-4-lx-en-card.pdf
[8] https://network.nvidia.com/files/doc-2020/pb-connectx-5-en-card.pdf
[9] https://bugzilla.redhat.com/show_bug.cgi?id=2029824
}) | ||
}) | ||
|
||
func findDeviceOnNUMANode(node *corev1.Node, devices []*sriovv1.InterfaceExt, numaNode string) (*sriovv1.InterfaceExt, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works. The possible issue I see is this causes quite a lot of ExecCommandOnNode
which is not too cheap. Could it be worth to discover all the relevant devices (or at least the PFs - I can't imagine a device which does or a setup which wants a VF on a different NUMA node than the one the PF is attached to) and report their NUMA affinity, than make logic on the test side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "report their NUMA affinity"? Wouldn't it involve a call to ExecCommandOnNode for each PF?
We can improve this by indexing every device by its NUMA node, but it wouldn't make that difference: suppose we have 4 devices splitter in NUMA nodes 0 and 1
device | NUMA node |
---|---|
ens1f0 | 0 |
ens1f1 | 0 |
ens3f0 | 1 |
ens3f1 | 1 |
With current implementation:
findDeviceOnNUMANode(..., 0)
calls 1ExecCommandOnNode
and return ens1f0findDeviceOnNUMANode(..., 1)
calls 3ExecCommandOnNode
and return ens3f0
For a total of 4 calls, the same as we would have by looping all the devices for indexing
Furthermore, these tests spend a lot of time waiting minutes for SR-IOV devices to get configured. I feel like we are trying to optimize a very small piece of the whole puzzle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not worried by run execution time, but we had (in PAO tests) quite a fair issues with ExecCommandOnNode
calls being fragile, leading to flaky tests.
But this is no biggie, we can evaluate later
} | ||
|
||
if previousPerfProfile != nil { | ||
OriginalPerformanceProfile = previousPerfProfile.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to return the previous perf profile and set the global/map/whatever in the calling site? this will at least make the flow more explicit instead of buried inside this function, which is not great when we mutate the global state.
b11131f
to
5e6d86a
Compare
Test cases uses a set SriovNetworkNodePolicies that targets at least two NIC, placed on two different NUMA nodes. Playing with the `excludeTopology` field, is it possible to create workload pod that uses multiple or a single NUMA node. Signed-off-by: Andrea Panattoni <[email protected]>
/retest |
is failing in other PRs, hence I assume it's not related to these changes. |
|
||
It("Validate the creation of a pod with excludeTopology set to False and an SRIOV interface in a different NUMA node than the pod", func() { | ||
pod := pods.DefinePod(sriovnamespaces.Test) | ||
pods.RedefineWithGuaranteedQoS(pod, "1", "100m") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you use the same pod = ... pattern here? Or, is there a reason why it was not done?
pods.RedefineWithGuaranteedQoS(pod, "1", "100m") | ||
pod = pods.RedefinePodWithNetwork(pod, "test-numa-0-exclude-topology-false") | ||
|
||
pod, err := client.Client.Pods(sriovnamespaces.Test). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we differentiate here between pod - the template for creation and pod - the created pod?
namespaces.CleanPods(sriovnamespaces.Test, sriovclient) | ||
}) | ||
|
||
It("Validate the creation of a pod with excludeTopology set to False and an SRIOV interface in a different NUMA node than the pod", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: validate the creation ... fails
left a few non blocking nits, to change only if you need to touch the codebase anyway. /lgtm |
/lgtm there area still some areas on which we can improve/generalize, but not blocking for this work. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fedepaol, zeeke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/e2e-gcp-ovn |
@fedepaol: Overrode contexts on behalf of fedepaol: ci/prow/e2e-gcp-ovn In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR contains a version bump for the sriov-network-operator to use the new API field
SriovNetworkNodePolicy.Spec.ExcludeTopology
.Tests leverage two devices on different numa nodes and guaranteed pods.
cc @SchSeba , @gregkopels