-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multiple network on pod using OVN k8s CNI #2775
Conversation
ab3b549
to
05b0096
Compare
05b0096
to
634ea96
Compare
634ea96
to
5b7a9f6
Compare
a752cf6
to
f7bfedb
Compare
ada462b
to
9b96c2a
Compare
/retest |
go-controller/pkg/cni/types/types.go
Outdated
MTU int `json:"mtu,omitempty"` | ||
// captures net-attach-def name in the form of namespace/name | ||
NadName string `json:"netAttachDefName,omitempty"` | ||
// set true if it is not the default net-attach-def |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the default net-attach-def" is unclear to me. Is that 'default network interface', other than 'secondary using net-attach-def"? If so, NadName could distinguish between default network and secondary network (i.e. NadName == "" must be secondary).
Signed-off-by: Yun Zhou <[email protected]>
To support multi OVN networks on a single Pod, we'd need to allow setting/deleting Pod annoation per OVN network. Also, ovnkube-master needs to update Pod annotation with RetryOnConflict(), because multi-networks could update the same Pod annotation and introduce conflict. Signed-off-by: Yun Zhou <[email protected]>
To support multi OVN networks on a single Pod, we'd need to allow setting/deleting node subnet annoation per OVN network. Also, ovnkube-master needs to update node annotation with RetryOnConflict(), because multi-networks could update the same node subnet annotation and introduce conflict. Signed-off-by: Yun Zhou <[email protected]>
to support multiple OVN networks on a single Pod, we'd need to set/unset smart-nic annoation on the per network basis. Signed-off-by: Yun Zhou <[email protected]>
allow OVN k8s CNI attaching multiple network interfaces to pods. Signed-off-by: Yun Zhou <[email protected]>
Signed-off-by: Yun Zhou <[email protected]>
Signed-off-by: Yun Zhou <[email protected]>
Signed-off-by: Yun Zhou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of, thank you for your work in this feature. The PR is huge, but, given the sheer amount of new functionality you're adding, I would have expected it to be a even bigger. Very nice work.
I have left a bunch of opinionated suggestions, but I am also asking some important questions.
Most importantly, I have the following concerns:
- scale. I am unsure your solution to indicate to which ovs bridge will be traffic be output to (when using the localnet topology) will scale. Our use cases imply tens (possibly hundreds) of net-attach-defs, meaning the length of the bindings string will be huge. I'm unsure it has a hard limit, but, it will impair performance if its length is too big.
- the localnet topology requires IP addresses. We have a use case for IPAM less localnet topologies . We need to sync on this.
- absence of multi-net policies without IPAM. I'm aware you don't consider this use case. We would also like to sync on this, since that use case is important for us at Red Hat. We could use your help / input to come up with a simple(st) way to define a way for this to happen .
- we also have a use case for pure overlays. IIUC, if we don't use the localnet port, it'll behave this way. Can you confirm that if we don't setup the
ovn-bridge-bindings
on each node's it will communicate via tunnel ?
namespace := pr.PodNamespace | ||
podName := pr.PodName | ||
if namespace == "" || podName == "" { | ||
return nil, fmt.Errorf("required CNI variable missing") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this code duplicated below; would it make sense to export it to an helper function ?
func (pr *PodRequest) validate() error {
if pr.PodNamespace == "" || pr.PodName == "" {
return fmt.Errorf("required CNI variable missing")
}
return nil
}
mtu := pr.CNIConf.MTU | ||
if mtu == 0 { | ||
mtu = config.Default.MTU | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also export this into an helper:
func (pr *PodRequest) mtu() int { // I'm unsure about the return type, but you get the idea ...
if pr.CNIConf.MTU == 0 {
return config.Default.MTU
}
return pr.CNIConf.MTU
}
This enables you to inline this is the call below:
...
podInterfaceInfo, err := PodAnnotation2PodInfo(annotations, useOVSExternalIDs, pr.PodUID, vfNetdevName,
pr.effectiveNADName, pr.mtu(), netNameInfo)
...
namespace := pr.PodNamespace | ||
podName := pr.PodName | ||
if namespace == "" || podName == "" { | ||
return nil, fmt.Errorf("required CNI variable missing") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the helper func could be re-used here.
condString := "external-ids:sandbox=" + pr.SandboxID | ||
if pr.CNIConf.IsSecondary { | ||
condString += " external_ids:network_name=" + pr.effectiveNADName | ||
} else { | ||
condString += " external_ids:network_name{=}[]" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use add yet another helper here, which allows you to simply:
ovsIfNames, err := ovsFind("Interface", "name", pr.condString())
I unfortunately could not come up with a decent name for it ...
req.effectiveNetName = types.DefaultNetworkName | ||
req.effectiveNADName = types.DefaultNetworkName | ||
if conf.IsSecondary { | ||
req.effectiveNetName = conf.Name | ||
if conf.NadName == "" { | ||
return nil, fmt.Errorf("OVN Netconf %q doesn't contain network-attachment-definition name", conf.Name) | ||
} | ||
req.effectiveNADName = conf.NadName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add (yet another ...) helper req
receiver function here ?
|
||
// Find all the logical node switches for the non-default networks and delete the ones that belong to the | ||
// obsolete networks | ||
nodeSwitches, err := libovsdbops.FindSwitchesWithOtherConfig(mc.nbClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these only the "logical node switches" - i.e. the logical switches for routed topology ? I find the comment a bit misleading - I wouldn't expect to see the localnet topology switches listed here after reading that comment.
I may have grossly misinterpreted your comment though.
err1 := oc.addLogicalPort4Nad(pod, nadName, nodeName, network) | ||
if err1 != nil { | ||
err = err1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if more than one failed ? You'd swallow all errors except the last one. Is that OK ? Shouldn't we at least log the error ?
if portInfo, err := oc.logicalPortCache.get(portName); err != nil { | ||
klog.Errorf(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain what would this scenario represent - i.e. a port missing from the cache ? I'm having trouble understanding this one.
@@ -258,17 +256,17 @@ func SetL3GatewayConfig(nodeAnnotator kube.Annotator, cfg *L3GatewayConfig) erro | |||
func ParseNodeL3GatewayAnnotation(node *kapi.Node) (*L3GatewayConfig, error) { | |||
l3GatewayAnnotation, ok := node.Annotations[ovnNodeL3GatewayConfig] | |||
if !ok { | |||
return nil, newAnnotationNotSetError("%s annotation not found for node %q", ovnNodeL3GatewayConfig, node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: imho, this read better previously.
type NetNameInfo struct { | ||
// netconf's name | ||
NetName string | ||
Prefix string | ||
IsSecondary bool | ||
} | ||
|
||
type NetAttachDefInfo struct { | ||
NetNameInfo | ||
// net-attach-defs shared the same CNI Conf, key is <Namespace>/<Name> of net-attach-def. | ||
// Note that it means they share the same logical switch (subnet cidr/MTU etc), but they might | ||
// have different resource requirement (requires or not require VF, or different VF resource set) | ||
NetAttachDefs sync.Map | ||
NetCidr string | ||
MTU int | ||
TopoType string | ||
VlanId int | ||
ExcludeIPs []net.IP | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these structs should be defined in a dedicated package, instead of a utils file with a bunch of unreleated code.
// Network MTU | ||
MTU int `json:"mtu,omitempty"` | ||
// set to localnet if it needs public interface | ||
TopoType string `json:"topology,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very confusing to have this attribute set to localnet when you just want an overlay (and relying on the ovn bridge bindings to have the underlay configured).
I think TopologyType
should allow 3 values: routed, overlay, and localnet, for instance.
- What this PR does and why is it needed
- Special notes for reviewers
- How to verify it
- Description for the changelog