Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Externally Created virtual functions #436

Merged
merged 13 commits into from
Oct 1, 2023

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented May 8, 2023

this PR adds the new API to allow the operator to use virtual functions
that the user configures manually on the system

this will allow new use-cases like using a VF as the SDN primary nic.
the nic will be created before the operator starts by the user, for
example using nmstate or NetworkManager and the operator will allocate
all the remaining vfs to pods.

@github-actions
Copy link

github-actions bot commented May 8, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented May 8, 2023

Pull Request Test Coverage Report for Build 5753402663

  • 77 of 444 (17.34%) changed or added relevant lines in 13 files are covered.
  • 12 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.1%) to 24.537%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/daemon/plugin.go 0 2 0.0%
pkg/systemd/systemd.go 0 2 0.0%
pkg/webhook/validate.go 17 23 73.91%
api/v1/helper.go 0 7 0.0%
pkg/daemon/writer.go 0 9 0.0%
pkg/daemon/daemon.go 21 32 65.63%
pkg/plugins/mellanox/mellanox_plugin.go 0 12 0.0%
pkg/plugins/generic/generic_plugin.go 8 23 34.78%
pkg/utils/mock/mock_store.go 5 40 12.5%
pkg/utils/utils.go 1 54 1.85%
Files with Coverage Reduction New Missed Lines %
pkg/utils/utils.go 2 6.1%
pkg/daemon/daemon.go 3 41.03%
pkg/host/mock/mock_host.go 7 2.99%
Totals Coverage Status
Change from base Build 5740514141: 0.1%
Covered Lines: 2145
Relevant Lines: 8742

💛 - Coveralls

@SchSeba SchSeba force-pushed the externally_manage branch from a9bb702 to af05d3b Compare May 9, 2023 08:54
@github-actions
Copy link

github-actions bot commented May 9, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba SchSeba force-pushed the externally_manage branch from af05d3b to df85137 Compare May 9, 2023 11:01
@github-actions
Copy link

github-actions bot commented May 9, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, PR looks good to me. I added a few comments error messages

pkg/utils/utils.go Outdated Show resolved Hide resolved
pkg/plugins/generic/generic_plugin.go Outdated Show resolved Hide resolved
@SchSeba SchSeba force-pushed the externally_manage branch from df85137 to 5cf1a83 Compare May 10, 2023 13:24
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member

zeeke commented May 10, 2023

LGTM

@SchSeba SchSeba force-pushed the externally_manage branch from 5cf1a83 to aed3c7b Compare May 11, 2023 11:36
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba SchSeba force-pushed the externally_manage branch from a2c3ae9 to 294054c Compare June 1, 2023 09:50
@github-actions
Copy link

github-actions bot commented Jun 1, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@@ -59,6 +59,8 @@ type SriovNetworkNodePolicySpec struct {
VdpaType string `json:"vdpaType,omitempty"`
// Exclude device's NUMA node when advertising this resource by SRIOV network device plugin. Default to false.
ExcludeTopology bool `json:"excludeTopology,omitempty"`
// don't create the virtual function only allocated them to the device plugin. Defaults to false.
ExternallyCreated bool `json:"externallyCreated,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does ExternallyCreated relate to other parameters here ?

we need to have this documented so users will know what to expect when using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add documentation for it let me know if you need anything else :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to do it per interface and not global?
Do you have use-case when some SR-IOV will be create by SR-IOV opeartor and some not?

Copy link
Collaborator

@moshe010 moshe010 Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it now in the description. You want to use it for CNI primary network and for secondary VF will be managed by sriov operator

@SchSeba SchSeba force-pushed the externally_manage branch from 294054c to 4e5a5e0 Compare June 14, 2023 14:08
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

1 similar comment
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.


// Externally created: we don't support ExternallyCreated + EswitchMode
//TODO: if needed we will need to add this in the future as today EswitchMode is for HWOFFLOAD
if cr.Spec.ExternallyCreated && cr.Spec.EswitchMode == sriovnetworkv1.ESwithModeSwitchDev {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you want to block it ?

if its externally created and device is already in switchdev mode then its OK no ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should block right now because all the systemd and bash script we do for this will be hard to manage.
and right know I am not aware of any request for this.

_, err = os.Stat(PfAppliedConfig)
if err != nil {
if os.IsNotExist(err) {
err = os.Mkdir(PfAppliedConfig, os.ModeDir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can call MkdirAll[1] on PfAppliedConfg

[1[ https://pkg.go.dev/os#MkdirAll

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting to wonder if we need a design doc for this one.

it feels that we need to get an agreement on what exactly ExternallyManaged means
then outline the changes in the different flows of sriov-network-operator.

that way:

  1. we agree on what we want to achieve
  2. make sure this PR is doing what we want

return nil
}

func ClearPCIAddressFolder() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add docsting

return nil
}

func CreatePfAppliedStatusFromSpec(p *sriovnetworkv1.Interface) *PfStatus {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add docsting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not save/load sriovnetworkv1.Interface ?

i just wonder if we really need this intermediate struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it would be nice if all these methods are bunched together in an interface.

essentially you are storing and loading information of a PF.
this "Store" once created can initialize its storage path.

// returns false if the file doesn't exist.
func LoadPfsStatus(pciAddress string, chroot bool) (*PfStatus, bool, error) {
if chroot {
exit, err := Chroot("/host")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to chroot ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: Because SyncNodeState is called with chroot

return err
}
// we only update the status to avoid any race conditions where a new policy can be applied
latestState.Status = updatedState.Status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all starting to get complex.

can you move all of this "prep" logic to its own function and call it here ?
please keep comments so we understand why things are there .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I will like to do it in a followup PR if possible

@@ -452,6 +458,15 @@ func (dn *Daemon) nodeStateSyncHandler() error {
syncStatus: syncStatusInProgress,
lastSyncError: "",
}
// wait for writer to refresh status then pull again the latest node state
<-dn.syncCh
updatedState, err := dn.client.SriovnetworkV1().SriovNetworkNodeStates(namespace).Get(context.Background(), dn.name, metav1.GetOptions{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just overwrite latestState here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we may get a new spec that we don't expect.

glog.Warningf("nodeStateSyncHandler(): Failed to fetch node state %s: %v", dn.name, err)
return err
}
// we only update the status to avoid any race conditions where a new policy can be applied
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if generation changed from when we first got it. just fail WDYT ?

im not a fan of starting to patch the object

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we can go with that

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
@@ -270,6 +319,12 @@ func NeedUpdate(iface *sriovnetworkv1.Interface, ifaceStatus *sriovnetworkv1.Int
glog.V(2).Infof("NeedUpdate(): VF %d MTU needs update, desired=%d, current=%d", vf.VfID, group.Mtu, vf.Mtu)
return true
}

// this is needed to be sure the admin mac address is configured as expected
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can the admin mac be "externally configured" ? that should be part of sriov configuration performed by the external service.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this to be sure we maintain the same capability as before.

this means for nics that don't configure the admin mac address the operator configures them when externally manage is not requested.
so we need to have the same for externally manage true

return err
}

if !exist {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if entry was not found, does this nessecarily mean that VFs were not created by sriov-operator ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because we always going to create the PF info

@adrianchiris
Copy link
Collaborator

also UT need to be added where possible.

@SchSeba SchSeba force-pushed the externally_manage branch from 13835b6 to b6f321d Compare July 3, 2023 12:14
@github-actions
Copy link

github-actions bot commented Jul 3, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Sep 4, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nit. It would be great if we could make some unit test around daemon/generic-plugin logic

pkg/utils/store.go Outdated Show resolved Hide resolved
pkg/utils/store.go Show resolved Hide resolved
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 12, 2023

Left some nit. It would be great if we could make some unit test around daemon/generic-plugin logic

I am working on a separate PR after we move the drain to a controller a refactor in the daemon that will use more interfaces so we can mock easily

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments. Overall shape looks good to me.

pkg/webhook/validate.go Outdated Show resolved Hide resolved
pkg/utils/utils.go Outdated Show resolved Hide resolved
pkg/utils/store.go Show resolved Hide resolved
pkg/utils/store.go Outdated Show resolved Hide resolved
pkg/utils/store.go Outdated Show resolved Hide resolved
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 26, 2023

Hi @zeeke let me know if the changes are good and I will rebase the commit messages

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for addressing my comments

pkg/utils/utils.go Show resolved Hide resolved
pkg/utils/utils.go Outdated Show resolved Hide resolved
Signed-off-by: Sebastian Sch <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing my comments

@SchSeba
Copy link
Collaborator Author

SchSeba commented Oct 1, 2023

Thanks folks!

Merging after 2 maintainers approval

@SchSeba SchSeba merged commit d8a33a8 into k8snetworkplumbingwg:master Oct 1, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants