Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vhost-vdpa support #467

Merged
merged 3 commits into from
Aug 23, 2023
Merged

vhost-vdpa support #467

merged 3 commits into from
Aug 23, 2023

Conversation

lmilleri
Copy link
Contributor

This PR implements the changes for vhost-vdpa support.

  • SriovNetworkNodePolicy API: VDPA device type can take now the following values: "virtio" and "vhost"
  • if vhost is specified, the vhost-vdpa driver is loaded into the kernel and bound to the vdpa device

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jul 10, 2023

Pull Request Test Coverage Report for Build 5948839312

  • 56 of 77 (72.73%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.4%) to 24.801%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/plugins/generic/generic_plugin.go 50 71 70.42%
Files with Coverage Reduction New Missed Lines %
pkg/plugins/generic/generic_plugin.go 3 50.53%
Totals Coverage Status
Change from base Build 5939485088: 0.4%
Covered Lines: 2122
Relevant Lines: 8556

💛 - Coveralls

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@lmilleri
Copy link
Contributor Author

@bn222 @SchSeba @adrianchiris @zshi-redhat Please review and approve if looks good

@@ -69,7 +71,7 @@ func (p *GenericPlugin) OnNodeStateChange(new *sriovnetworkv1.SriovNetworkNodeSt
p.DesireState = new

needDrain = needDrainNode(new.Spec.Interfaces, new.Status.Interfaces)
needReboot = needRebootNode(new, &p.LoadVfioDriver, &p.LoadVirtioVdpaDriver)
needReboot = needRebootNode(new, &p.LoadVfioDriver, &p.LoadVirtioVdpaDriver, &p.LoadVhostVdpaDriver)
Copy link
Collaborator

@bn222 bn222 Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now starts to look cumbersome. Can we start this PR with a refactor of p.LoadVfioDriver and vdpa driver? What's happening here is that the needRebootNode actually returns a bunch of things through its arguments. That function is now doing much more than "just checking if we need to reboot".

We also have an enum and a small state machine to load drivers. This could be simplified much further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bn222 pushing the refactoring of driver loading logic

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris adrianchiris requested a review from SchSeba July 23, 2023 13:31
return p.DriverStateMap
}

func (p *GenericPlugin) TestDriverLoading(state *sriovnetworkv1.SriovNetworkNodeState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LoadDriverForTests ?

pkg/plugins/generic/generic_plugin.go Show resolved Hide resolved
pkg/plugins/generic/generic_plugin.go Show resolved Hide resolved
if needVfioDriver(state) {
*loadVfioDriver = loading
update, err := tryEnableIommuInKernelArgs()
func nodeToRebootIfVfio(state *sriovnetworkv1.SriovNetworkNodeState, driverMap DriverStateMapType) (updateNode bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeToRebootIfVfio ? maybe needRebootIfVfio ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also lets use "needReboot" as return val as thats what it means.

update, err := tryEnableIommuInKernelArgs()
func nodeToRebootIfVfio(state *sriovnetworkv1.SriovNetworkNodeState, driverMap DriverStateMapType) (updateNode bool) {
updateNode = false
for driverID, driverState := range driverMap {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driverMap is a .... map.

no need to iterate over it. instead just index it by Vfio

*loadVfioDriver = loading
update, err := tryEnableIommuInKernelArgs()
func nodeToRebootIfVfio(state *sriovnetworkv1.SriovNetworkNodeState, driverMap DriverStateMapType) (updateNode bool) {
updateNode = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is by default false. so can be removed.

}
}
func needRebootNode(state *sriovnetworkv1.SriovNetworkNodeState, driverMap DriverStateMapType) (needReboot bool) {
needReboot = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is by default false. so can be removed.


if needReboot {
needDrain = true
}
return
}

// ////////////// for testing purposes only ///////////////////////
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move those to the end of the file ?

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

if [ $vfid -le $maxId ] && [ $vfid -ge $minId ] && [ $vdpaType == "virtio" ]; then
vdpa_cmd="vdpa dev add name vdpa:"${VfPciAddr}" mgmtdev pci/"${VfPciAddr}
if [ $vfid -le $maxId ] && [ $vfid -ge $minId ] && [ $vdpaType != "" ]; then
vdpa_cmd="vdpa dev add name vdpa:"${VfPciAddr}" mgmtdev pci/"${VfPciAddr}" max_vqp 32"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this means? max_vqp 32 it will always be this number? should it be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, this will be configurable in the future by extending the API? For the time being, the idea is to enable the multiqueue support by providing by default 32 queues. We think it is the best value, considering that the virtio-net driver will use the number of cores as a maximum. The reason for enabling multiqueue is to increase the performances of the card and be ready to demonstrate it to Verizon

if err := p.HostManager.LoadKernelModule("vfio_pci"); err != nil {
glog.Errorf("generic-plugin Apply(): fail to load vfio_pci kmod: %v", err)
return err
func syncDriverState(p *GenericPlugin) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have this function as part of the struct no need to pass it

for _, driverState := range p.DriverStateMap {
if !driverState.DriverLoaded && driverState.NeedDriverFunc(p.DesireState, driverState) {
if err := p.HostManager.LoadKernelModule(driverState.DriverName); err != nil {
glog.Errorf("generic-plugin Apply(): fail to load %s kmod: %v", driverState.DriverName, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the function on the log generic-plugin Apply()

func syncDriverState(p *GenericPlugin) error {
for _, driverState := range p.DriverStateMap {
if !driverState.DriverLoaded && driverState.NeedDriverFunc(p.DesireState, driverState) {
if err := p.HostManager.LoadKernelModule(driverState.DriverName); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a V(2) log say what driver it's loading (good for debug)

@@ -104,6 +144,11 @@ func (p *GenericPlugin) Apply() error {
}
}

err := syncDriverState(p)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please just simply this like

if err := p.syncDrvierStates(); err != nil {
 return err
}

return
}

// ////////////// for testing purposes only ///////////////////////
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you have a space here :)

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Aug 4, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Aug 8, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from my side.

one small comment RE log verbosity

func (p *GenericPlugin) syncDriverState() error {
for _, driverState := range p.DriverStateMap {
if !driverState.DriverLoaded && driverState.NeedDriverFunc(p.DesireState, driverState) {
glog.Infof("loading driver %s", driverState.DriverName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use glog.V(2).Infof as @SchSeba suggested below.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator

thanks @lmilleri !

@adrianchiris
Copy link
Collaborator

@SchSeba @zeeke shall we proceeed with merging this one ?

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one nit comment

Overall, LGTM and good testing coverage. Thanks

pkg/plugins/generic/generic_plugin.go Outdated Show resolved Hide resolved
Signed-off-by: Leonardo Milleri <[email protected]>
Max number of queues is set to 32 by default.
This setting should help with the performance of
NVIDIA network cards.

Signed-off-by: Leonardo Milleri <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

SriovNetworkNodePolicy API: VDPA device type can take now the following values: "virtio" and "vhost"
Vhost/vdpa: Same behaviour of virtio/vdpa apart for the drivers that are loaded in the kernel (vhost-vdpa instead of virtio-vdpa)

Signed-off-by: Leonardo Milleri <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke merged commit ee0e017 into k8snetworkplumbingwg:master Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants