Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add selectors nicNames #434

Closed

Conversation

ian-howell
Copy link
Contributor

@ian-howell ian-howell commented Jun 27, 2022

This addresses #433 by adding the nicNames fields as selectors

(8/8/2022 Update): This no longer addresses macAddresses

@ian-howell ian-howell force-pushed the dev/nicname-selectors branch from 9526731 to 77dc299 Compare June 27, 2022 22:11
@coveralls
Copy link
Collaborator

coveralls commented Jul 4, 2022

Pull Request Test Coverage Report for Build 4316249828

  • 54 of 118 (45.76%) changed or added relevant lines in 5 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-1.8%) to 76.508%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/factory/factory.go 0 2 0.0%
pkg/netdevice/netDeviceProvider.go 35 37 94.59%
pkg/devices/gen_net.go 8 11 72.73%
pkg/utils/utils.go 0 57 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/devices/gen_net.go 1 83.54%
pkg/netdevice/netDeviceProvider.go 1 85.0%
Totals Coverage Status
Change from base Build 4294051176: -1.8%
Covered Lines: 1941
Relevant Lines: 2537

💛 - Coveralls

@adrianchiris
Copy link
Contributor

@ian-howell does PfName selector satisfy your use-case based on community discussion we had on jul 18th ?

if so we can close this PR and related issue

@ian-howell ian-howell force-pushed the dev/nicname-selectors branch from 77dc299 to e62de29 Compare August 8, 2022 14:30
@ian-howell ian-howell changed the title Add selectors nicNames and macAddresses Add selectors nicNames Aug 8, 2022
@ian-howell
Copy link
Contributor Author

Hey @adrianchiris, after some investigation, I've found that the pfNames selector isn't sufficient for our use cases. I need to be able to target the device as it is named within the VM. This change seems to satisfy that.

pkg/utils/utils.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add unit-tests for new device selector

pkg/resources/pciDevice.go Outdated Show resolved Hide resolved
pkg/utils/utils.go Outdated Show resolved Hide resolved
@ian-howell
Copy link
Contributor Author

@e0ne Thanks for the review! I think I've resolved all of your comments. Let me know if there's anything else I need to do.

@zshi-redhat
Copy link
Collaborator

@ian-howell Is this for VF passthrough to VM or is it also applied to virtio interfaces?

@ian-howell
Copy link
Contributor Author

This is just for VF passthrough

@ian-howell
Copy link
Contributor Author

Gentle bump on this one

@SchSeba
Copy link
Collaborator

SchSeba commented Oct 24, 2022

I will review this PR sorry for the long wait

@ian-howell
Copy link
Contributor Author

I will review this PR sorry for the long wait

@SchSeba Any update on this?

@ian-howell ian-howell force-pushed the dev/nicname-selectors branch from 42b0f79 to 70fa4ce Compare December 16, 2022 17:01
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry for the late review..

this looks fine to me just a small comment for the documentation the code looks good

README.md Outdated Show resolved Hide resolved
@ian-howell
Copy link
Contributor Author

Thanks @SchSeba! I've addressed the documentation when you get another chance

@SchSeba
Copy link
Collaborator

SchSeba commented Dec 20, 2022

/lgtm

@ian-howell
Copy link
Contributor Author

@e0ne Could you give this a review when you get a moment? I think I just need one more approval for merge. Thanks in advance!

@ian-howell
Copy link
Contributor Author

Not trying to be pushy, but could I please get an update on this when someone finds the time?

@adrianchiris
Copy link
Contributor

adrianchiris commented Jan 10, 2023

Hi Ian, appologies for the late response.

looking at this PR you essentially add a netdevice selector right ? it will select devices based on their netdev name.

this is problematic since sriov-network-device plugin is often used with CNI which means netdevice will be moved to the container network namespace.

if at that point device plugin is restarted, resources that are already allocated to running pods will not be discovered.

in VM, can device selection be done using PCI addresses ?

@ian-howell
Copy link
Contributor Author

Hey Adrian, thanks for looking over this.

Yes, the goal is to create a netdevice selector. At the time of instantiation of the device-plugin ConfigMap, all that we have is the interface name. This is because we are creating templates of kubevirt machines, and the concrete VM definitions won't come until afterward. We could specify PCI addresses, but due to libvirt limitations, we can't guarantee that what we specified will be what appears in the VM.

We did some testing to get a better understanding of the problem you've pointed out, and sure enough, the resources are lost when the device-plugin restarts. We will continue to look for a solution for this, but if you happen to have any ideas that might help us, I would be greatly appreciative.

@ian-howell ian-howell force-pushed the dev/nicname-selectors branch 2 times, most recently from 23909e2 to c6b3e6d Compare March 2, 2023 16:30
@ian-howell ian-howell force-pushed the dev/nicname-selectors branch from c6b3e6d to 5b78495 Compare March 2, 2023 16:45
@ian-howell
Copy link
Contributor Author

@adrianchiris I have a solution which appears to work. Basically, we iterate over the child namespaces, switching as we go, and search within for an interface whose alias matches the interface as it were named in the parent namespace. I've verified that this works on our hardware.

When you get a second, I would appreciate it if you could take a second look at this and let me know what you think. Thanks!

@ian-howell
Copy link
Contributor Author

@adrianchiris Just a gentle bump on this one. Please let me know if there's anything wrong with the latest approach


for _, fInfo := range fInfos {
ifaceName := fInfo.Name()
alias, _ := GetInterfaceAlias(ifaceName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this contain the original interface name ?

if statement below indicates that this might not be set for netdev ?

i need to give this one a try on a system to be able to provide proper feedback.

with this, we could potentially determine the link type as well when creating the device in gen_net.go

we have a community meeting tommorow so i will add this PR to the agenda to get some attention to this PR and have others provide feedback as well.

Copy link
Contributor

@adrianchiris adrianchiris Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments from todays community meeting:

Unsure about the approach - going over all network namesapces

  1. Need to ensure it does not incur a performance penalty
    Currently PR scans all namespaces per pci net device for which it did not find interface name.
    One way to overcome this is to scan once, store results and use that as an additional source of information when creating pci net device obj. there can be ~100 ns on a node and between 1-8 NICs each with two PFs so between 2 and 16 pci functions.

  2. Host interface name might not match interface alias (e.g if one set by udev and the other by driver)

  3. Can alias be changed by CNI or by container (given enough privileges) ?

}
linkType = la.EncapType
} else if len(netNamesFromNs) > 0 {
ifName = netNamesFromNs[0]
Copy link
Contributor

@adrianchiris adrianchiris Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ifname now may contain either:

  1. empty string
  2. the actual interface name
  3. interface name alias derived when searching for device in network namespaces

just a though (and am not sure we want to go this way) , perhaps have a separate attribute ifalias in GenNetDevice which contains this info.

then nicNames selector (maybe it should be named ifAlias ?) will match on that field.

@Eoghan1232
Copy link
Collaborator

Hey @ian-howell , couple quick questions:

When you say iterate over namespaces, are these pod namespaces or host namespaces?
SRIOV NDP already operates in the host namespace.

NDP should not be iterating over pod namespaces.

What is the difference between pfName and ifaceName ? only that it is looking for an alias?
What would happen if there is more than one alias with the same name? since they could be named similar but in different namespaces.

@ian-howell
Copy link
Contributor Author

@Eoghan1232

NDP should not be iterating over pod namespaces.

I'm curious why not? I don't think what we're trying to achieve is possible without doing so.

@Eoghan1232
Copy link
Collaborator

@Eoghan1232

NDP should not be iterating over pod namespaces.

I'm curious why not? I don't think what we're trying to achieve is possible without doing so.

NDP scope should be limited to host network namespace for device discovery on the host.

if NDP starts to snoop over other net namespaces, it is kind of a breach of trust.
The network isolation is there for a reason.

This could pose further issues if NDP starts messing with pod network namespaces

Can you explain the use case a bit more?
from my understanding, you are essentially creating VM's with kubevirt, which is a K8s pod, and then passing the PF to the pod, then want NDP to manage the VF pools from the host side?
i.e. control VF's inside VM from the host.

@ian-howell
Copy link
Contributor Author

The use case is a bit more involved. We're creating a VM with kubevirt, and within that VM, we're spinning up an inner k8s cluster. So the "PF" in this case is technically a VF within the VM. The only information about that function we have to work with is its name as viewed from within the VM

What I want to do is schedule a pod to the inner k8s cluster, and pass the kubevirt pod's VF into the inner pod.

With our testing, we've found that pfNames is insufficient for this purpose. The pciAddresses selector works, but due to ordering, we can't know the PCI address of the kubevirt pod's VF until after it's running, which is too late.

@Eoghan1232
Copy link
Collaborator

Hi @ian-howell , we discussed this again in a meeting today.

would you be able to join a community meeting on the 4th April - link to the meeting doc, there is a zoom link at the top.
The meeting takes place at 14:00 UTC.

It would be easier to have a discussion over a call on this :)

@ian-howell
Copy link
Contributor Author

would you be able to join a community meeting on the 4th April - link to the meeting doc, there is a zoom link at the top.
The meeting takes place at 14:00 UTC.

I will be present.

@ian-howell
Copy link
Contributor Author

Would it be possible/acceptable to implement this function:

// Probe - does device healthcheck. Not implemented
func (rp *ResourcePoolImpl) Probe() bool {
// TO-DO: Implement this
return false
}

If I'm understanding correctly, this could be used to periodically refresh the device list, which would allow my original implementation (without the namespace-switching) to work. It also looks like this is functionality that may be useful to others as well. I believe a good implementation could partially satisfy #276 and provide boilerplate for #362

@Eoghan1232
Copy link
Collaborator

Eoghan1232 commented Apr 4, 2023

Hi @ian-howell , apologies, I gave the wrong time, it's 13:00 UTC, our doc must be outdated, due to timezone change.
the meeting is in progress if you would still like to join.

@Eoghan1232
Copy link
Collaborator

Eoghan1232 commented Apr 4, 2023

Hi @ian-howell ,

we discussed the new proposal you mentioned above to use the Probe func, to solve the use case.
We need to see what the original intention by Kuberentes was for this probe method and perhaps see what other device plugins have done.
you can also see the discuss notes in the google doc that was linked previously. link again

would you be able to join the Tuesday April 18th 13:00 UTC ? :)

@ian-howell
Copy link
Contributor Author

Hi @Eoghan1232, looks like I might have just missed the meeting. Thanks for keeping this on the agenda, and yes, I will plan on attending the meeting on the 18th.

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 19, 2023

Hi @ian-howell I contact the kubevirt team.

I wanted to ask you if you can try to add apci config to the VM definition

interfaces:
      - masquerade: {}
        name: default
      - bridge: {}
        name: br10
        acpiIndex: 101

Then check `cat /sys/bus/pci/devices/<PCI>/acpi_index

if this exist on your node we can implement a new acpi selector that will always work :)

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 19, 2023

second option we can use today kubevirt already support to add tags for the VM

- bridge: {}
        name: br10
        tag: blue

and they are using the same API as OpenStack so I think with a minimal change in the sriov-operator we can support also tags in the virtual plugin we already have

https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/utils/utils_virtual.go#L123:6

https://kubevirt.io/user-guide/virtual_machines/startup_scripts/#device-role-tagging

@adrianchiris
Copy link
Contributor

@ian-howell any updates ?

were the alternatives working for you ? (acpi index or vm tags) ?

@ian-howell
Copy link
Contributor Author

ian-howell commented Jun 2, 2023

Hi @SchSeba, @adrianchiris, sorry for the prolonged delay.

I took a look into the acpi index and have found that it fits our use case, while also being dramatically simpler than the previous solutions. I've created a new PR at #488. I'll close this PR when we've finalized the new one

@adrianchiris
Copy link
Contributor

as PR #488 has merged 🚀 🚀 this one can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants