Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVS HWOL: use generic plugin to write switchdev device configuration #106

Closed

Conversation

zshi-redhat
Copy link
Collaborator

@zshi-redhat zshi-redhat commented Apr 11, 2021

Added SriovOperatorConfig.Spec.OvsHardwareOffload API field to define Labels that matches with MachineConfigPool NodeSelector. Nodes that's selected by the Labels will be applied with "bindata/manifest/switchdev-config".

Removed MCO plugins and moved the writing of switchdev device config file into generic plugin (shared by OpenShift and Kubernetes deployment). Removed the writing of switchdev device config file from K8s plugin.

For configuring OVS HWOL on OpenShift, the steps are changed to:

  • Configured default SriovOperatorConfig to apply switchdev-config services to MachineConfigPool
  • Configure SriovNetworkNodePolicy with eSwitchMode: switchdev (This writes switchdev.conf file and trigger drain/reboot where needed)

For configuring OVS HWOL on Kubernetes, the steps are not changed.

@lgtm-com
Copy link

lgtm-com bot commented Apr 11, 2021

This pull request introduces 1 alert when merging 2356622 into f17625b - view on LGTM.com

new alerts:

  • 1 for Useless assignment to local variable

@zshi-redhat
Copy link
Collaborator Author

/cc @Mmduh-483 @pliurh

@github-actions github-actions bot requested a review from pliurh April 11, 2021 08:54
Copy link
Contributor

@Mmduh-483 Mmduh-483 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the code on k8s cluster?

@@ -212,3 +196,25 @@ func needDrainNode(desired sriovnetworkv1.Interfaces, current sriovnetworkv1.Int
}
return
}

func needRebootNode(state *sriovnetworkv1.SriovNetworkNodeState, loadVfioDriver *uint) (needReboot bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add logs in this function why reboot is needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -127,24 +127,6 @@ func (p *K8sPlugin) OnNodeStateChange(old, new *sriovnetworkv1.SriovNetworkNodeS
}
}

// Check switchdev config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s plugin is only checking the services now, can you reformat the condition on line 121 to be
if !utils.IsSwitchdevModeSpec(new.Spec) {
continue
}

Can you also remove the word "not" in the comment in line 120

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mmduh-483 I thought we still need the p.servicesStateUpdate() to set the right flags or content for K8s_plugin struct, isn't it? Otherwise, in line p.updateTarget.needUpdate() , it will always be false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation:

  • p.servicesStateUpdate runs when if condition utils.IsSwitchdevModeSpec(new.Spec) is met, I suggest a simple refactoring for the if condtion

  • At the start of OnNOdeChange p.updateTarget is reset to be all false p.updateTarget.needUpdate() is always false if there is no switchdev spec

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your suggestion is to change:

        if utils.IsSwitchdevModeSpec(new.Spec) {
                // Check services
                err = p.servicesStateUpdate()
                if err != nil {
                        glog.Errorf("k8s-plugin OnNodeStateChange(): failed : %v", err)
                        return
                }
        }

To:

        if !utils.IsSwitchdevModeSpec(new.Spec) {
                continue
        }
        err = p.servicesStateUpdate()
        if err != nil {
                glog.Errorf("k8s-plugin OnNodeStateChange(): failed : %v", err)
                return
        }

Is it correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

glog.V(0).Infof("nodeStateSyncHandler(): plugin %s: reqDrain %v, reqReboot %v", k, d, r)
reqDrain = reqDrain || d
reqReboot = reqReboot || r
}
glog.V(0).Infof("nodeStateSyncHandler(): reqDrain %v, reqReboot %v disableDrain %v", reqDrain, reqReboot, dn.disableDrain)

for k, p := range dn.LoadedPlugins {
if k != GenericPlugin && k != McoPlugin {
if k != GenericPlugin {
if k == K8sPlugin && utils.ClusterType == utils.ClusterTypeOpenshift {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this condition needed for the k8s plugin and openshift cluster type?
on line 663 you ensure that plugin is not loaded

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove, there is no need for this conditional check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@zshi-redhat
Copy link
Collaborator Author

Did you test the code on k8s cluster?

I didn't, I only tried on OpenShift.

@Mmduh-483
Copy link
Contributor

@zshi-redhat lgtm for me, can you verify it is working on k8s cluster?

@zshi-redhat
Copy link
Collaborator Author

@zshi-redhat lgtm for me, can you verify it is working on k8s cluster?

@Mmduh-483 I'd like to, but I don't have an env that can be easily deployed with k8s.

// MachineConfigPool is selected when its NodeSelectors are included in MachineConfigPoolSelector
for _, pool := range dc.Spec.OvsHardwareOffload {
if selector.Matches(labels.Set(pool.MachineConfigPoolSelector)) {
mcpMap[mcp.GetName()] = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we store this information in the status, so that users can know which MCP is matched?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -24,6 +24,15 @@ type SriovOperatorConfigSpec struct {
DisableDrain bool `json:"disableDrain,omitempty"`
// Flag to enable OVS hardware offload. Set to 'true' to provision switchdev-configuration.service and enable OpenvSwitch hw-offload on nodes.
EnableOvsOffload bool `json:"enableOvsOffload,omitempty"`
// OvsHardwareOffload describes the OVS HWOL configuration for selected MachineConfigPool
OvsHardwareOffload []OvsHardwareOffloadConfig `json:"ovsHardwareOffload,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen such an API design in K8S API. Normally, if users need to select objects, they shall use a selector which matches the label of the objects. Personally, I think this implementation is hard to understand.

Shall we ask users to label the MCP by themself, and we use a selector here?

Copy link
Collaborator Author

@zshi-redhat zshi-redhat Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen such an API design in K8S API. Normally, if users need to select objects, they shall use a selector which matches the label of the objects. Personally, I think this implementation is hard to understand.

My previous thinking was that we may want to re-use the same API for k8s case in the future.
Current OvsHardwareOffload.MachineConfigPoolSelector matches on node labels, where for OpenShift, it matches the (node) labels defined in MCP Spec.NodeSelector, for k8s, it could match the node labels directly.

Shall we ask users to label the MCP by themself, and we use a selector here?

One idea I had before was to label or annotate MCP (with ovs-hw-offload) directly, then not change the SR-IOV API, instead, we can have SR-IOV Operator watch on MCP update for creating/deleting HWOL MCs. wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen such an API design in K8S API. Normally, if users need to select objects, they shall use a selector which matches the label of the objects. Personally, I think this implementation is hard to understand.

There is a similar example in tuned Operator, which was suggested by MCO team: https://github.com/openshift/cluster-node-tuning-operator/blob/master/pkg/apis/tuned/v1/tuned_types.go#L79
It defines the MachineConfigLabels in Tuned API to match on MCPs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous thinking was that we may want to re-use the same API for k8s case in the future.

If we plan to reuse it for k8s, the name MachineConfigPoolSelector is kind of misleading. How about just use NodeSelector instead?

One idea I had before was to label or annotate MCP (with ovs-hw-offload) directly, then not change the SR-IOV API, instead, we can have SR-IOV Operator watch on MCP update for creating/deleting HWOL MCs. wdyt?

I prefer we can have a common API for both OCP and K8S. Otherwise, we still need a way to tell when the ovs hw offload shall be applied in the k8s case. Currently, it's hard bonded with the switchdev configuration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to NodeSelector

} else {
return fmt.Errorf("Failed to get MachineConfig: %v", err)
for mcpName, enable := range mcpMap {
mcName := "00-" + mcpName + "-" + OVS_HWOL_MACHINE_CONFIG_NAME_SUFFIX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to create one MC for each selected pool? Have you considered if spec.machineConfigSelector in MCP can be used instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with current API change, we probably can use single MC be matched by all MCP.
I was thinking for advanced cases that we might have HWOL configurations that differ between each MCP in the future (e.g. the tc_policy config), in which case, having one MC per MCP would be needed.

}
}

update, remove, err := utils.WriteSwitchdevConfFile(state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to distinguish 'remove' from 'update' here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, will update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

foundMC := &mcfgv1.MachineConfig{}
foundMCP := &mcfgv1.MachineConfigPool{}
// Offload is not supported for master MachineConfigPool
delete(mcpMap, "master")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deletion, can we check the MCP name before adding to mcpMap, and log an error if users select the master pool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

One node can only be added in one custom MCP pool by
design of MCO. Since current ovs-hw-offload MCs are
applied to hardcoded MCP ovs-hw-offload-worker, other
existing custome MCPs cannot be enabled with ovs hw
offload configuration.

This commits allows ovs-hw-offload MCs be applied
to multiple MCPs as specified in default SriovOperatorConfig.

Signed-off-by: Zenghui Shi <[email protected]>
Signed-off-by: Zenghui Shi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants