-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix detection of Mellanox VFs in switchdev mode #395
Fix detection of Mellanox VFs in switchdev mode #395
Conversation
2e44eed
to
72f5883
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey !
Added some comments. also please add some more info in commit message explaining the problem and proposed solution
Thanks!
40db3b8
to
a62e391
Compare
a62e391
to
abc4fb8
Compare
abc4fb8
to
d125594
Compare
Signed-off-by: Alexander Maslennikov <[email protected]>
ee86c54
to
82cec98
Compare
Signed-off-by: Alexander Maslennikov <[email protected]>
Signed-off-by: Alexander Maslennikov <[email protected]>
812b75a
to
fd4eb62
Compare
fd4eb62
to
d885505
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments ! LGTM
Thanks for working on this! LGTM |
I tested with current version, it seems the pfName is still retrieved from sysfs, rather than devlink. the |
pkg/utils/utils.go
Outdated
pfEswitchMode, err := GetPfEswitchMode(pciAddr) | ||
if err != nil { | ||
// If device doesn't support eswitch mode query, fall back to the default implementation | ||
if strings.Contains(strings.ToLower(fmt.Sprint(err)), "no such device") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an additional use-case which slipped my mind.
if GetPfName
is called with a PF (i.e non-sriov) address and it does not have sriov VFs then devlink will fail with
[root ~]# devlink dev eswitch show pci/0000:03:00.0
devlink answers: Operation not supported
in this case, we should also use sysfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a condition handling this case
@zshi-redhat when you run |
d885505
to
75b6e17
Compare
I got different results when using devlink cmd and the device plugin program:
added the following debug message:
got the empty output:
|
Signed-off-by: Alexander Maslennikov <[email protected]>
75b6e17
to
94245df
Compare
@zshi-redhat this could be an issue in the netlink library. The proposed solution is to log a warning when we get an empty eswitch mode and then use the default implementation |
I think you're right, the netlink ilbrary returns empty for the PF (w/o sriov configured), in which case it doesn't error out in the GetPfEswitchMode and continue to hit the empty string check and then the previous pfname logic. |
Merged per three approvals. |
@almaslennikov Thanks for your persistence in fixing the comments! |
hmm.. @almaslennikov I just tried the latest master, it seems not getting the eswitch mode correctly.
Do you see the same issue in your environment? |
My setup is simple that one CX-5 with two ports, one port is switchdev enabled and sriov configured. ens4f1 is the PF with pci address 0000:af:00.1
devlink shows the right switchdev mode:
but I got this message (0000:af:0a.0 is the VF pci address of the above PF):
|
@zshi-redhat the version of netlink used in this commit is broken. I've created another PR fixing it. I've described the netlink issue there. Please, take a look: #398 |
This fixes #383 by calling the function implemented in https://github.com/Mellanox/sriovnet/blob/ecc40df73c7c2d53a10fd86368583de1937c40f6/sriovnet_switchdev.go#L60 that correctly determines the name of PF in switchdev mode for given VF by checking phys_port_id for each device in
/sys/bus/devices/<vf_address>/physFn/net
The original issue is that current logic selects the first device in
physFn/net
which might not be the PF if the device is in switchdev mode. The proposed solution is to use a function from https://github.com/Mellanox/sriovnet library that iterates over the devices and determines the PF by validating its physical portSigned-off-by: Alexander Maslennikov [email protected]