Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to detect Mellanox VFs in switchdev mode using pfName nicSelector #383

Closed
zshi-redhat opened this issue Oct 9, 2021 · 0 comments · Fixed by #395
Closed

Fail to detect Mellanox VFs in switchdev mode using pfName nicSelector #383

zshi-redhat opened this issue Oct 9, 2021 · 0 comments · Fixed by #395
Assignees
Labels
bug Something isn't working

Comments

@zshi-redhat
Copy link
Collaborator

zshi-redhat commented Oct 9, 2021

What happened?

OpenShift cluster, Mellanox ConnectX-5 configured in switchdev mode using sriov-network-operator, VFs can be observed on the target worker node, but device plugin fail to detect VFs with pfName nicSelector, which results in zero resource reported in node status.

It is noticed that when VF is released from sriov pod, its representor name appears as the first element in the host directory as below (because of the naming convention):

# ls /sys/bus/pci/devices/0000\:d8\:01.5/physfn/net/
58609951635cbfa  ens8f1  ens8f1_0  ens8f1_1  ens8f1_3  ens8f1_4  ens8f1_5

0000:d8:01.5: the VF PCI address
58609951635cbfa: the VF representor name after its VF released from sriov pod
ens8f1: PF name of the VF.

This 58609951635cbfa name is considered as the PF name when utils.GetPfName is called, which of course doesn't match with the pfName specified in the device plugin config, this in turn results in VFs be filtered out from the resource pool.

What did you expect to happen?

VFs in switchdev mode be detected successfully using pfNames selector.

What are the minimal steps needed to reproduce the bug?

  • Deploy a baremetal kubernetes cluster with one of the workers containing Mellanox CX-5 card
  • Deploy sriov-network-operator and apply sriov network node policy (use pfName in the nicSelector field) to configure VFs in switchdev mode on CX-5 interface
  • Observe that VFs resource been reported to node status
  • Create sriov pod requesting VF resource, delete the pod once it is successfully created
  • Restart sriov network device plugin manually (delete the device plugin pod on target node)
  • Observe that zero number of resource be reported by sriov device plugin

Component Versions

Component Version
SR-IOV Network Device Plugin master
SR-IOV CNI Plugin master
OS 4.18.0-341.el8.x86_64

Config Files

Config file locations may be config dependent.

Device pool config file location (Try '/etc/pcidp/config.json')
{
  "resourceList": [
    {
      "resourceName": "mlxnics",
      "selectors": {
        "vendors": [
          "15b3"
        ],
        "pfNames": [
          "ens8f1"
        ],
        "rootDevices": [
          "0000:d8:00.1"
        ],
        "linkTypes": [
          "ether"
        ],
        "IsRdma": false,
        "NeedVhostNet": false
      },
      "SelectorObj": null
    }
  ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants