-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BGP support to the Antrea Agent #5948
Comments
I feel no need to restrict Pod IP advertisement to noEncap only. We can support encap mode too? For LoadBalancer IP we should be able to enable ECMP too. |
I edited the issue to remove the reference to noEncap. It was left over from a previous draft I was working on.
Yes that's a good point. I guess in this case all cluster Nodes advertise the LoadBalancer IP (or at least all cluster Nodes running at least one backend Pod for the Service) with the same "cost", with kube-proxy / AntreaProxy being responsible for the last traffic hop. This would be quite different from how |
Use case 2 would be extremely beneficial for some use cases we have. We are really interestes in this for pod ips/cidr and also for egress. We dont use antrea for LB but that would be a good option to have as well |
Use case 2 is very interesting for our setup, however that is limited to service ips. Could there be a feature switch/selector to enable/disable what and when you should advertise? For instance I may want to advertise some service ips for some namespaces or all but no pod ip's etc? |
@ColonelBundy I definitely wanted to have the ability to disable advertising Pod CIDRs. |
Pod CIDRs can be allocated from multiple non overlapping IPPools (IPAM noencap) , it is evident that only when Pod CIDR is allocated from IPPool, routes should be advertised otherwise it might not be required to be advertised, We might want to include multiple IPPool support for BGP. |
@ColonelBundy Question: I understand that you want to advertise the Pod CIDR or Service IP to another AS. Do you want the routes advertised from another AS to be distributed and installed on K8s Nodes? |
For our use case we only want to advertise service ips. To put it simply, we're looking to not to have to use metallb for l3 external ips. Having the option to select which ippool to advertise and to which peer would be a killer feature. |
Got that. If so, I think that a client in another AS should be reachable via the default route of your cluster Nodes, so that the reply packets from the connection, which is originated from another AS and destined to a Service in the cluster, can be forwarded back where it is originated. Is that your setup? @ColonelBundy |
Yea that sounds good |
It would be a very powerful feature if we could solve use case 2 with something out of the box, as it would add great flexibility when using Egress and ServiceExternalIP. As mentioned, there were ways to solve it by using static routes etc. But maintaining static routes is troublesome when nodes are decommissioned, new ones are provisioned, and the interfaces are moved between nodes. Using BGP would solve this and dynamically update the routes on-demand. I created a post last year using Daemonset to install and configure FRR to get past this, probably not the prettiest way, but it gave me what I wanted: https://blog.andreasm.io/2023/02/20/antrea-egress/. |
@andreasm80 that's a nice blogpost, is it ok if I link to it from the https://antrea.io website? |
Thanks @antoninbas. Yes, that is ok by me. |
@ColonelBundy Hello, I saw the case that
Thanks |
Pretty spot on, except we currently have no use case to advertise pod cidrs to an external peer right now. But that may change with time. And also to clarify, a selector for which pod cidrs to advertise would also be very handy. |
Do you mean that using a selector to select target K8s Nodes and advertising their Pod CIDRs? |
More along the lines of which pods. I might wish to advertise some pods in some namespaces to some peers. |
Thanks for the suggestion. We will keep that in mind. Could you tell me the reason why advertise Pods directly in some Namespaces instead of using Service IPs within those Namespaces? Has such case been employed in a production environment? |
We don't have such a case at the moment. And I do agree that advertising service ips should be the priority if you ever want to advertise individual services. But then again, we don't have this specific use case as of this moment, so feel free to dismiss this idea if it's not within scope. |
Option2 covers our use case as well. Some additional notes: -We would be using this for the Load Balancer IP, Service External IP, and the Antrea Egress IP. BGP routers running in Antrea would peer to BGP Route Reflector servers. iBGP, single ASN. -The ability to advertise a static list of IP's would be beneficial instead of and/or in addition to whatever automatic method of advertising individual IP's for the services. Example would be to assign a /26 worth of addresses to a cluster, add the /26 prefix to the list and BGP advertises the /26 instead of all the individual /32's. Less prefixes in the route table and less prefix add/remove churn as services are added and removed. -Adding the BGP learned routes into the nodes/Antrea route table so that traffic between different clusters in the same Layer2 subnet can go direct cluster1 node -> cluster2 node instead of hair pinning off some upstream router. |
Thanks for the feedback @notsrch
@hongliangl what do you think? I assume that we will easily be able to extend the API in such a way that users can provide custom CIDRs for us to advertise from the Node(s)? Instead of the K8s-native / dynamic way of advertising individual Service IPs, we will just advertise the full CIDR and users will need to use the same CIDR for the ExternalIPPool(s) they use for Services.
@hongliangl I believe that at the moment we were not planning to support this (learning routes). |
Do you mean the API like the following field type Advertisements struct {
Service *ServiceAdvertisement `json:"service,omitempty"`
Pod *PodAdvertisement `json:"pod,omitempty"`
Egress *EgressAdvertisement `json:"egress,omitempty"`
}
type ServiceAdvertisement struct {
// Determine whether to advertise ClusterIPs.
ClusterIPs bool `json:"clusterIPs,omitempty"`
// Determine whether to advertise ExternalIPs.
ExternalIPs bool `json:"externalIPs,omitempty"`
// Determine whether to advertise LoadBalancerIPs.
LoadBalancerIPs bool `json:"loadBalancerIPs,omitempty"`
// Empty now, selectors to be added later, which are used to select specific Services.
// Selectors []Selector `json:"selectors,omitempty"`
ExternalIPCIDRs []string `json:"externalIPCIDRs,omitempty"`
} My only concern is that the remote BGP router might learn the routes that are not reachable for specific Services. For example, if a Service is configured with externalTrafficPolicy Local and there is no Endpoint in a K8s Node, the Service is not reachable on the Node. If we advertise the custom CIDRs containing the Service LBIPs or externalIPs from Node(s), the remote BGP router will learn and install the route(s) whose next hop(s) is(are) the Node(s). If a client behind the remote BGP router destines for the Service mentioned above, and the BGP router chooses a route whose next hop is a Node on which there is Endpoint for the Service, the client will get a rejection.
Yes, we don't have plan to support that (install the routes learned from remote BGP router) |
Yes, it would be the user's responsibility to use this correctly. If the user wants to advertise a CIDR, we should just do it, and it will be the user's responsibility to make sure that all addresses in the CIDR map to Services which can be reached from all the Nodes using that BGP policy. We cannot add any extra intelligence there without defeating the purpose of this field. Since this is something we can easily add to the API later on without breaking backwards-compatibility, I don't think we should include it in the first version anyway. |
Yes. That would be the expected (and desired) behavior from our point of view. |
Add version v1alpha1 of BGPPolicy to crd.antrea.io For #5948 Signed-off-by: Hongliang Liu <[email protected]>
This commit implements the controller for the BGPPolicy API, designed to advertise Service IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes. According to the spec of BGPPolicy, the Node selector is used to select Nodes to which a BGPPolicy is applied. Multiple BGPPolicies can be applied to the same Node. However, only the oldest BGPPolicy will be effective on a Node, with others serving as alternatives. The effective one may be changed in the following cases: - The current effective BGPPolicy is updated and not applied to the Node. - The current effective BGPPolicy is deleted. The BGP server instance is only created and started for the effective BGPPolicy on a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance will be terminated by calling the `Stop` method, and a new BGP server instance will be created and started by calling the `Start` method for the new effective BGPPolicy. To create a BGP server instance, ASN, router ID, and listen port must be specified. The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID, if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no Node IPv4 address is available, the router ID could be specified via the Node annotation `node.antrea.io/bgp-router-id`. If not present, a router ID will be generated by hashing the Node name and update it to the Node annotation `node.antrea.io/bgp-router-id`. Additionally, the stale BGP server instance will be terminated and a new BGP server instance should be created and started when any of ASN, routerID, or listen port changes. The information of the BGP peers is specified in the effective BGPPolicy. The unique identification of a BGP peer is the peer IP address and peer ASN. To reconcile the latest BGP peers: - Get the BGP peers to be added and add them by calling the `AddPeer` method of the BGP server instance. - Get the BGP peers to be deleted and delete them by calling the `RemovePeer` method of the BGP server instance. - Get the remaining BGP peers and calculate the updated BGP peers, then update them by calling the `UpdatePeer` method of the BGP server instance. The information of the IPs to be advertised can be calculated from the spec of the effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers. To reconcile the latest IPs to all BGP peers: - If the BGP server instance is newly created and started, advertise all the IPs by calling the `AdvertiseRoutes` method. - If the BGP server instance is not newly created and started: - Get the IPs/CIDRs to be added and advertise them by calling the `AdvertiseRoutes` method. - Get the IPs/CIDRs to be removed and withdraw them by calling the `WithdrawRoutes` method. The feature is gated by the alpha BGPPolicy FeatureGate and only supported in Linux. For #5948 Signed-off-by: Hongliang Liu <[email protected]>
With #6009 #6203 being merged, I think we could close this one and track future improvements via separate issues. For users who want to use the feature, please refer to https://github.com/antrea-io/antrea/blob/release-2.1/docs/bgp-policy.md and try out Antrea v2.1.0. |
Describe the problem/challenge you have
Over the years we have had a few requests to add BGP speaker capabilities to the Antrea Agent. The purpose of this issue is to collect the use cases that we would like to cover with this capability.
Note that while it is possible to meet some of these use cases by deploying kube-router in "BGP mode" alongside Antrea, having this capability available OOTB means potentially a better integration with Antrea features, and doesn't require users to deploy yet another DaemonSet in their cluster.
I believe that there are 3 main use cases for BGP in K8s with Antrea:
1 & 3 are not very interesting IMO, because they just provide alternative implementations to what we already support, and there is no clear benefit. However, we can add some value with 2 for on-prem users who want to make K8s endpoints routable by their BGP fabric.
As a side note, Calico and kube-router support both 1 & 2, while Cilium has added support for 2.
Describe the solution you'd like
I believe that our support should focus on use case 2:
Each Antrea Agent should run a BGP speaker and advertise local IPs to a list of configured BGP peers. The AS number (ASN) for the Antrea Agent should be configurable (all Agents may use the same local ASN or not). The list of advertised local IPs should be configurable from this list:
Egress
feature) Egress IPs - with this capability, it will be possible for routes to be automatically configured in the physical network for "return" Egress trafficServiceExternalIP
feature) LoadBalancer Service IPs - on-prem users with a BGP fabric will be able to easily expose K8s Services to the rest of their network. At the moment, theServiceExternalIP
feature requires LoadBalancer IPs to be allocated from the Node network (or requires adding static routes to the physical network).A note on Egress IP advertisement:
EgressSeparateSubnet
feature but L3 / BGP approach vs L2 approach?Anything else you would like to add?
While the exact API is yet to be decided, BGP peering should ideally be configurable using CRD(s).
cc @jianjuns @tnqn
The text was updated successfully, but these errors were encountered: