Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BF2 in connectX mode #353

Merged
merged 2 commits into from
Sep 28, 2022

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Aug 22, 2022

Fix the issue introduced in https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/240/files#r808624002

Allow to use BF in connectX mode also for OCP platform.
This is needed until we support the systemd configuration

Signed-off-by: Sebastian Sch [email protected]

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 22, 2022

/cc @adrianchiris @e0ne @zshi-redhat @bn222 @zshi-redhat @zeeke

please have a look

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba SchSeba force-pushed the support_BF2_connectx branch from 1c010a2 to c12eef3 Compare August 22, 2022 12:07
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Aug 22, 2022

Pull Request Test Coverage Report for Build 3097548519

  • 16 of 117 (13.68%) changed or added relevant lines in 5 files are covered.
  • 7 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.06%) to 24.977%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/plugins/mellanox/mellanox_plugin.go 0 2 0.0%
pkg/utils/switchdev.go 4 6 66.67%
pkg/plugins/generic/generic_plugin.go 0 5 0.0%
pkg/utils/utils.go 12 33 36.36%
pkg/utils/utils_mlx.go 0 71 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/utils/utils.go 2 7.86%
controllers/sriovnetwork_controller.go 5 69.0%
Totals Coverage Status
Change from base Build 3097501597: -0.06%
Covered Lines: 1871
Relevant Lines: 7491

💛 - Coveralls

// Create a map with all the PFs we will need to configure
// we need to create it here before we access the host file system using the chroot function
// because the SkipConfigVf needs the mstconfig package that exist only inside the sriov-config-daemon file system
vfsToConfig := map[string]bool{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, rename this variable (and parameter) to vfsToSkip or something like that, as its entries seem to be true when the Virtual Functions must not be configured.

Another solution may be to make utils.SyncNodeState(...) receive ifSpecs []sriovnetworkv1.Interface, ifStatuses []sriovnetworkv1.InterfaceExt and filter out the to-skip interfcaes before calling it.

return bluefield2Conntexd, nil
}

return -1, fmt.Errorf("MellanoxBlueFieldMode(): unknow card status for %s", PciAddress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SP: unknown

return 0, fmt.Errorf("failed to find %s in the mstconfig output command", internalCPUOffloadEngine)
}

// check for DPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment: These registers aren't available with some versions of mstflint.

INTERNAL_CPU_MODEL should also be 1 as well. This is true for both DPU mode and NIC mode. Maybe we should add a check here just to be safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wizhaoredhat if for both it will be 1 why do we need to check it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that the NIC was incorrectly set to Separated Host Mode which will have a value of zero.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good done

@SchSeba SchSeba force-pushed the support_BF2_connectx branch from c12eef3 to 9a35de5 Compare September 1, 2022 10:21
@github-actions
Copy link

github-actions bot commented Sep 1, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to test this PR. With this change we introduce more platform-specific and NIC-specific code which is used across the operator instead of in s specific plugins

return false, nil
}

glog.V(2).Infof("SkipConfigVf(): skip config VF for BF2 device on DPU mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neet to confirm that such use-case is not needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use case you refer ?

// Create a map with all the PFs we will need to configure
// we need to create it here before we access the host file system using the chroot function
// because the SkipConfigVf needs the mstconfig package that exist only inside the sriov-config-daemon file system
vfsToSkip := map[string]bool{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be good to move this code into the separate function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/plugins/generic/generic_plugin.go Show resolved Hide resolved
@SchSeba SchSeba force-pushed the support_BF2_connectx branch from 9a35de5 to 9b5717d Compare September 7, 2022 10:01
@github-actions
Copy link

github-actions bot commented Sep 7, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 7, 2022

I still need to test this PR. With this change we introduce more platform-specific and NIC-specific code which is used across the operator instead of in s specific plugins

I agree but I was not able to find a different solution, but I am optimistic that the rework I am doing will make stuff simpler

glog.V(2).Infof("MellanoxBlueFieldMode():card %s in ConnectX mode", PciAddress)
return bluefield2Conntexd, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion. I believe it would help with debugging if who have some logs here (in the case of failure) what the actual mstconfig is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to give this one a deeper look.

one thing in mind is creating an interface to provide NIC information (e,g if its DPU or non DPU)
then keeping the vendor specific logic in plugin.

pkg/utils/utils_mlx.go Outdated Show resolved Hide resolved
pkg/utils/utils_mlx.go Outdated Show resolved Hide resolved
pkg/utils/switchdev.go Outdated Show resolved Hide resolved
return false, nil
}

glog.V(2).Infof("SkipConfigVf(): skip config VF for BF2 device on DPU mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use case you refer ?

pkg/utils/utils.go Outdated Show resolved Hide resolved
@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 12, 2022

@adrianchiris about your comment

what is the use case you refer ?

what do you mean?

@SchSeba SchSeba force-pushed the support_BF2_connectx branch from 9b5717d to 29997e3 Compare September 12, 2022 17:19
@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 12, 2022

@bn222 done

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba SchSeba force-pushed the support_BF2_connectx branch from 29997e3 to 374d929 Compare September 12, 2022 17:21
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne
Copy link
Collaborator

e0ne commented Sep 12, 2022

/test-all

pkg/utils/utils.go Outdated Show resolved Hide resolved
@SchSeba SchSeba force-pushed the support_BF2_connectx branch from 374d929 to e3492d1 Compare September 15, 2022 10:08
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne
Copy link
Collaborator

e0ne commented Sep 16, 2022

/test-all

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 18, 2022

@e0ne @bn222 can you make another round of review so we can merge this PR please :)

…riov-network-operator/pull/240/files#r808624002

Allow to use BF in connectX mode also for OCP platform.
This is needed until we support the systemd configuration

Signed-off-by: Sebastian Sch [email protected]
@SchSeba SchSeba force-pushed the support_BF2_connectx branch from e3492d1 to 42eb7e9 Compare September 21, 2022 11:48
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified with vanilla k8s 1.25

@bn222
Copy link
Collaborator

bn222 commented Sep 28, 2022

lgtm

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 28, 2022

Thanks for the review folks I am going to merge this PR

@SchSeba SchSeba merged commit c630e3e into k8snetworkplumbingwg:master Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants