-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support shared LoadBalancerIP for multiple Services #6480
Conversation
bcb27d8
to
9fc3e4e
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only minor comments and questions from me
Are you planning on adding an e2e test for the "normal" case in a follow-up PR?
pkg/agent/types/annotations.go
Outdated
@@ -30,6 +30,9 @@ const ( | |||
// ServiceExternalIPPoolAnnotationKey is the key of the Service annotation that specifies the Service's desired external IP pool. | |||
ServiceExternalIPPoolAnnotationKey string = "service.antrea.io/external-ip-pool" | |||
|
|||
// ServiceAllowSharedIPAnnotationKey is the key of the Service annotation that specifies whether the Service is allowed to use shared LoadBalancer IP. | |||
ServiceAllowSharedIPAnnotationKey string = "service.antrea.io/allow-shared-ip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would make sense to add lb
(for LoadBalancer) in the annotation, e.g. "service.antrea.io/allow-shared-lb-ip"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and didn't add it because it would be very similiar to the MetalLB annotation but with a tiny difference, which may make it harder to remember if people is already familiar with the MetalLB one.
Another reason is we already use full name (load-balancer) in other annotation (load-balancer-mode), I didn't want to use different writing in this annotation while "allow-shared-load-balancer-ip" may be too long. But maybe "allow-shared-load-balancer-ip" is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to allow-shared-load-balancer-ip, please let me know if you prefer allow-shared-lb-ip compared with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either one is fine (allow-shared-load-balancer-ip
or allow-shared-ip
to match metalLB). In the case of metalLB I would say there is a little less ambiguity because it is a dedicated LB solution, so it makes sense that "IP" would refer to LB IP in this context.
I am not really concerned about the length of the annotation (allow-shared-load-balancer-ip
).
docs/service-loadbalancer.md
Outdated
* The Services use the `Cluster` external traffic policy, or they have | ||
identical Endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a question from me: do the Endpoints need to be identical or do they just need to be located on the same subset of Nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the latter, I thought the distribution of Pods may be out of user's control. I could change to the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you and I don't think there is a need to change it to the more "accurate" version. If anything, we should probably make it clear that we discouraged using the annotation for Services with the Local
external traffic policy.
if knownIPsByPool[ipPool].Has(ipStr) { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we log something if this happens but the current service is not sharable? Or will it anyway happen in syncService
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it will be released by syncService
, but we can log unexpected events here.
I'm also considering preserving the IP, just like before restart. Then we can say the annotation affects IP allocation, allocated IP won't be revoked by changing the annotation. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on other discussions, I think my preference would be to preserve IPs just like before restart, and let the user correct the situation. I also don't think it is a good practice for us to mutate the Service and remove a previously-allocated LB IP just because the Agent restarted. Did I understand your question correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are on the same page.
// Update other Services as they may be affected by the release of this IP. | ||
c.enqueueServicesWithoutIPs(allocation.ipPool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that always true or only when the IP was used exclusively by the Service and actually released here? Probably not a worthy "optimization" though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for this case: a service sharing an IP with other Services changes its annotation from shareable to unsharable, in which case we didn't release its IP as I want to limit the annotation for requesting IP only, at the moment new Services can no longer get this IP. After deleting this Service, new Services can get this IP. Maybe I should add a comment for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, the new comment helps. This is because we can get in situations where Services are sharing the same IP even though not all Services have the annotation, based on user actions.
if allowSharedIP != prevIPAllocation.sharable { | ||
c.updateIPAllocation(key, prevIPAllocation.ipPool, prevIPAllocation.ip, allowSharedIP) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will not be an error log if the annotation is removed while the IP was already shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to limit the annotation for IP request stage only. Checking whether an annotation change is valid requires more code, but we can't do anything to prevent it, so perhaps not worth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I think we can start off like this. If it seems that users of this feature keep running into situations that are difficult to troubleshoot without extra logs, we can look into adding more validation / logs.
I'm working e2e test, will add it to this PR. |
4dae453
to
f653c81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a theoretical question for you: if the ServiceExternalIP
feature was not itself Alpha, would you have introduced a feature gate for this? I know that the feature only becomes "active" if the annotation is used, but I remember you saying in the past that we were too liberal with introducing new functionality without a corresponding feature gate (e.g., FQDN support in Network Policies). I guess this may not apply here given that ServiceExternalIP
is still Alpha and we can consider this an integral part of the ServiceExternalIP
feature.
for _, mutator := range mutators { | ||
mutator(&service) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea for a future PR: it may be convenient to have a ServiceBuilder
for tests, like we have a PodBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I could follow up with a PR.
test/e2e/service_externalip_test.go
Outdated
@@ -368,6 +369,68 @@ func testServiceWithExternalIPCRUD(t *testing.T, data *TestData) { | |||
} | |||
} | |||
|
|||
func testServiceSharingLoadBalancerIP(t *testing.T, data *TestData) { | |||
pool := data.createExternalIPPool(t, "pool-", v1beta1.IPRange{CIDR: "172.30.0.0/28"}, nil, nil, nil) | |||
defer data.crdClient.CrdV1beta1().ExternalIPPools().Delete(context.TODO(), pool.Name, metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a preference for defining ctx := context.Background()
once at the beginning of the test function, since we have not intention of replacing these "TODO"s in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/e2e/service_externalip_test.go
Outdated
require.NoError(t, data.waitForServiceLoadBalancerIP(svc2, "172.30.0.1"), "svc2 should get the shared IP assigned") | ||
require.NoError(t, data.waitForServiceLoadBalancerIP(svc3, ""), "svc3 should not get the shared IP assigned") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would use assert
here since we could check the second assertion even if the first one fails and that could provide extra information in case of failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, done
return false, err | ||
} | ||
} else { | ||
if len(service.Status.LoadBalancer.Ingress) == 0 || service.Status.LoadBalancer.Ingress[0].IP != expectedLoadBalancerIP { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we fail immediately with a hand-crafted error if expectedLoadBalancerIP != ""
and service.Status.LoadBalancer.Ingress[0].IP != expectedLoadBalancerIP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code can also be used to validate if changing requested IP works, though we don't have this test case yet.
I agree failing immediately under that condition can help detect some transient issue, but if that could happen, it may also happen that the IP is changed from expected IP to unexpected IP, and returning true immedialy when expected IP is seen is also not enough..
Users want to share LoadBalancerIP between multiple Services when they face external IP shortage. It's possible to do it when the Services sharing an IP meet the requirements: * The Services use different ports * The Services use the `Cluster` external traffic policy, or they have identical Endpoints. However, the ability of using any IP that is already allocated to another Service may incur a security risk that a Service can "steal" LoadBalancer traffic intended for another Service. To support the use case without introducing the security risk, we use the annotation `service.antrea.io/allow-shared-load-balancer-ip: true` on Services to restrict IPs that can be shared. Services without the annotation will continue to have their LoadBalancerIPs exclusively used. Services with the annotation can share an IP between themselves when requesting the same IP. Ideally, we should also check if the Services meet the first two requirements before assigning the IP to them. However, it's difficult to prevent Services from being changed to not meet the requirements after they get the IP assigned. Therefore, we assume that users using the feature know how to configure Services properly. Signed-off-by: Quan Tian <[email protected]>
Yes, if |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Users want to share LoadBalancerIP between multiple Services when they face external IP shortage. It's possible to do it when the Services sharing an IP meet the requirements:
Cluster
external traffic policy, or they have identical Endpoints.However, the ability of using any IP that is already allocated to another Service may incur a security risk that a Service can "steal" LoadBalancer traffic intended for another Service.
To support the use case without introducing the security risk, we use the annotation
service.antrea.io/allow-shared-load-balancer-ip: true
on Services to restrict IPs that can be shared. Services without the annotation will continue to have their LoadBalancerIPs exclusively used. Services with the annotation can share an IP between themselves when requesting the same IP.Ideally, we should also check if the Services meet the first two requirements before assigning the IP to them. However, it's difficult to prevent Services from being changed to not meet the requirements after they get the IP assigned. Therefore, we assume that users using the feature know how to configure Services properly.
Fixes #4309