-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EndpointSlice support (#4501) #4663
EndpointSlice support (#4501) #4663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good so far and I'm excited to start playing with it. As per my comments, I think we need to get clearer on the logic for handling adds/updates/deletes of EndpointSlices and keep in mind that the total address set for a service can be made up of several EndpointsSlices.
return false | ||
} | ||
|
||
func isValidSlice(es *dv1beta1.EndpointSlice) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we always call this immediately before calling getEndpointSliceSVCName
. What do you think of building this check into getEndpointSliceSVCName
so that if the slice is not valid, we return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great idea! Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid piece of work @Matei207! I need to read a bit more about endpoint slices and come back to do another pass and validate some of the assumptions that are being made here. Will try testing that today. Also left some questions about things that are not quite clear to me
return | ||
} | ||
|
||
pp.noEndpoints(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is this here correct? If a slice is deleted, you are calling noEndpoints(false). Correct me if I am wrong but this should send an update saying that the svc_name:port does not exist anymore. If you delete an endpoint slice does that mean that we should be sending exists=false? @adleong wdyt ? Also, does that really mean that no Endpoints need to be send. There might be other endpoints in other slices. I have the feeling I might be getting all this wrong. Have to think about that a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from the comments of the proxy api:
// If the service does not exist then the controller must send
// `no_endpoints{exists: false}` ASAP when a client subscribes or when the
// service stops existing. If the service exists and has endpoints available
// then the controller must send `add` that lists all (or at least a large
// number) of the endpoints for the service. If and only if the service exists
// but does not have any endpoints available then the controller SHOULD send
// `no_endpoints{exists: true}` when a client subscribes. In other words, the
// `no_endpoints` message must only be sent when there are *no*endpoints for
// the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think there are two problems here:
- we cannot inform the proxy that there are no endpoints simply because a slice was deleted because there might be other slices that are associated with this service
- we certainly cannot inform the proxy that this service has no associated endpoints because of the above as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comment @zaharidichev it all makes sense! I was a bit confused around the NoEndpoints
concept.
I wanted to send a NoEndpoints
message only if the port publisher has no addresses registered anymore, however. The way I thought this was:
- becase a svc:port might have loads of addresses split across different slices, go through the current addresses for that port publisher and remove them.
- if at the end we do not have any addresses, send a no endpoints message.
I tried to do this with the line above:
if len(pp.addresses.Addresses) > 0 -- return out of the function
perhaps this is not the most intuitive way to do though, any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be clearer if you do something like:
svcExists := len(pp.addresses.Addresses) > 0
pp.noEndpoints(svcExists)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But all that brings an interesting for me question. Namely does the fact that a service has no endpoints (and hence noe addresses) mean that it should be non existent for the purposes of discovery? I see that in deleteEndpoints
we used to fire a NoEndpoints(false). I guess this was designed like that for a reason. Coming from some previous discussions around services that do not have any endpoints and are there just as proxies (for mc purrposes), I think it is a valid scenario to do a discovery on a service that does not have any endpoints.. or is it not? @adleong wdyt?
@Matei207 Note, this is not directly related to this PR. I am just thinking out loud. Your approach is correct for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be clearer if you do something like (...)
That's an amazing suggestion, thank you! It reads much better.
But all that brings an interesting for me question (...) does the fact that a service has no endpoints (and hence no addresses) mean that it should be non-existent for the purposes of discovery?
No, at least I don't think so. This is a very good question, based on the test cases and the general functionality that is provided by the destination service, I think in the case of a service that does not have any endpoints available, listeners should still be able to subscribe to the service and receive a "NoEndpoints" message. Since on subscription the client (proxy) receives a stream of updates, it's only fair that a service be available for discovery even if no endpoints (addresses) exist; my assumption is that there might be a stream of updates propagated to the client at some point in the future (?), pods can come up (and disappear) at any time.
So @zaharidichev I agree with that it is valid to do a discovery on a service that does not have any endpoints, based on what I've seen so far.
Thank you for the comments and feedback @zaharidichev @adleong. I have incorporated most of the suggested changes, for some I'm waiting to hear back from you and to see what the general consensus is (e.g Watcher firing up both the slice and endpoints informers at the same time). I will be changing the logic when adding/updating a single endpointslice to reflect all feedback, this might take a bit longer. Let me know what you think of the changes I made, as well as any other questions, concerns etc. Thanks again! |
return | ||
} | ||
|
||
pp.noEndpoints(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be clearer if you do something like:
svcExists := len(pp.addresses.Addresses) > 0
pp.noEndpoints(svcExists)
return | ||
} | ||
|
||
pp.noEndpoints(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But all that brings an interesting for me question. Namely does the fact that a service has no endpoints (and hence noe addresses) mean that it should be non existent for the purposes of discovery? I see that in deleteEndpoints
we used to fire a NoEndpoints(false). I guess this was designed like that for a reason. Coming from some previous discussions around services that do not have any endpoints and are there just as proxies (for mc purrposes), I think it is a valid scenario to do a discovery on a service that does not have any endpoints.. or is it not? @adleong wdyt?
@Matei207 Note, this is not directly related to this PR. I am just thinking out loud. Your approach is correct for the time being.
for _, p := range slicePorts { | ||
switch targetPort.Type { | ||
case intstr.Int: | ||
if *p.Port == targetPort.IntVal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is nil? The docs say something intersting that I am not sure I can interpret correctly:
// ports specifies the list of network ports exposed by each endpoint in
// this slice. Each port must have a unique name. When ports is empty, it
// indicates that there are no defined ports. When a port is defined with a
// nil port value, it indicates "all ports". Each slice may include a
// maximum of 100 ports.
// +optional
// +listType=atomic
All ports?... not sure what they mean here. Any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaharidichev I looked into this and I couldn't find anything that makes sense. I'm a bit puzzled by it too, in all honesty.
I had to do some work around the nil port today, it's not related to the description but it might offer some insight. In my case, whenever an endpointslice had all of its associated pods deleted, all of the addresses (i.e endpoints) and ports automatically got deleted and set to nil (because they use pointers instead of empty lists sigh).
I suppose in this case, an EndpointSlice with a nil ports
field, targets all targetPorts of the service. So, going back on a previous explanation, an EndpointSlice
is an endpoints subset. An endpoints subset is a grouping of addresses with common ports. This means every slice (assuming none of the slices hit their endpoint capacity limit) would have a different set of ports, an EndpointSlice with nil ports then would mean ALL ports of the service? Perhaps for the rare case when a pod exposes all ports? Still, not sure why it has to be nil, but it might make sense? Lemme know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The port number of the endpoint. If this is not specified, ports are not restricted and must be interpreted in the context of the specific consumer.
It sounds a bit underspecified. Let's not worry too much about this. We can come back and adjust this behavior if we run into a situation where it's problematic.
@adleong @zaharidichev sorry for the slight delay on this, but I ran into some bumps along the way. To start, I think (99% sure) I covered every piece of feedback that has been initially left on this PR, I think it's finally ready to be checked again. There are some changes that I've made that I will document here to make it easier to follow the differences. There are also some bits and pieces that I am pretty sure you will flag down, so I will comment directly on them to let you know why they're in there in the first place. Changes
|
Labels: newAddresses.Labels, | ||
} | ||
remove = AddressSet{ | ||
Addresses: removeAddresses, | ||
} | ||
return | ||
} | ||
|
||
func isTargetPortInSlice(targetPort namedPort, slice *discovery.EndpointSlice) bool { | ||
if slice.Ports == nil && slice.Endpoints == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of an edge case, let:
web
-- a service
pod-1
-- web-pod-1
pod-2
-- web-pod-2
s
-- slice for web
: {pod-1-IP
, pod-2-IP
}
When we execute
kubectl delete pod-1, pod-2
Then we expect NoEndpoints
for that slice. In actuality, if we don't include this check, we will be left with an endpoint that does not exist, but the application thinks exists.
- Pod-1 gets deleted first -- s updates itself (removes pod-1-ip) --> dest-service sends a remove for pod-1-ip
2.1. Pod-2 gets deleted second -- s updates itself (removes pod-2-ip) --> s now has NIL PORT and NIL Endpoints => isTargetPortInSlice
returns false (without this line) since the port is nil, the target port of the service is definitely not in there.
2.2. Pod-2 gets deleted second -- s updates itself (removes pod-2-ip) --> s now has NIL PORT and NIL Endpoints => isTargetPortInSlice
returns true, the slice is processed, dest-service sends a remove for pod-2-ip.
Hope it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we push the port handling logic down into resolveESTargetPort
we can avoid these kinds of edge cases.
|
||
func (pp *portPublisher) updateEndpointSlice(oldSlice *discovery.EndpointSlice, newSlice *discovery.EndpointSlice) { | ||
newAddressSet, oldAddressSet := pp.endpointSliceToAddresses(newSlice), pp.endpointSliceToAddresses(oldSlice) | ||
if len(newAddressSet.Addresses) == 0 && len(oldAddressSet.Addresses) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very special case that I've come across when I was playing around. When we don't have any endpoints left and we update a slice with nil ports and nil endpoints with a slice that has nil ports and nil endpoints, the dest-service spams NoEndpoints
. This checks avoids that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my suggestion below to not use NoEndpoints, this won't be an issue.
} | ||
|
||
pp.addresses = newAddressSet | ||
if len(pp.addresses.Addresses) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If after updating we end up with no addresses (1 service, 1 slice, oldSlice = 1 IP address, newSlice = nil ports, nil endpoints, for example) we should send a NoEndpoints
. So far this seems to work well, looking for some feedback here if this makes sense logically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Endpoints code, if we get an update with 0 addresses, we know that all addresses have been deleted so we send a NoEndpoints which I think is effectively a shorthand for "remove all". In the the case of EndpointSlices the logic is more complicated because an updated slice with 0 addresses doesn't mean that there are no addresses for the whole service. I think we can do ourselves a favor here and just not use the NoEndpoints
shortcut at all and just rely on plain Adds and Removes for managing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! This makes so much sense then. I've been under the impression constantly when testing that when all addresses are removed the no endpoints
message should be sent. This is why I was so focused on putting it in there. Thank you for clarifying, it's more intuitive just doing it with adds and removes I think so will remove it (the no endpoints call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along nicely.
@@ -283,6 +292,66 @@ func (ew *EndpointsWatcher) deleteEndpoints(obj interface{}) { | |||
} | |||
} | |||
|
|||
func (ew *EndpointsWatcher) addEndpointSlice(obj interface{}) { | |||
es := obj.(*discovery.EndpointSlice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should do a typesafe cast here so that we can log an error and return if for some reason obj is not an EndpointSlice. This should never happen but it's better to be defensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment for the other handler funcs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added typesafe casts to addEndpointSlice
, updateEndpointSlice
and addEndpoints
. The delete handlers have more logic around it and it doesn't make sense to do it here. Let me know if you want me to add it there too though!
defer sp.Unlock() | ||
sp.log.Debugf("Updating EndpointSlice for %s", sp.id) | ||
for _, port := range sp.ports { | ||
if !isTargetPortInSlice(port.targetPort, newSlice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can safely skip this update in this case. What if the oldSlice includes the target port but the newSlice does not? In that case we would need to emit a remove of those addresses, I think.
newAddressSet.Addresses[id] = addr | ||
} | ||
|
||
addAddress := make(map[ID]Address, len(newAddressSet.Addresses)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the diffAddresses
function here?
Labels: newAddresses.Labels, | ||
} | ||
remove = AddressSet{ | ||
Addresses: removeAddresses, | ||
} | ||
return | ||
} | ||
|
||
func isTargetPortInSlice(targetPort namedPort, slice *discovery.EndpointSlice) bool { | ||
if slice.Ports == nil && slice.Endpoints == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we push the port handling logic down into resolveESTargetPort
we can avoid these kinds of edge cases.
for _, p := range slicePorts { | ||
switch targetPort.Type { | ||
case intstr.Int: | ||
if *p.Port == targetPort.IntVal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The port number of the endpoint. If this is not specified, ports are not restricted and must be interpreted in the context of the specific consumer.
It sounds a bit underspecified. Let's not worry too much about this. We can come back and adjust this behavior if we run into a situation where it's problematic.
} | ||
|
||
func getEndpointSliceServiceID(es *discovery.EndpointSlice) (ServiceID, error) { | ||
err := isValidSlice(es) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you, but since this is the only place this is called, you could inline it here and that would remove some of the error handling boilerplate.
pkg/k8s/authz.go
Outdated
return errors.New("EndpointSlice resource not found") | ||
} | ||
|
||
func checkEndpointSlicesExist(k8sclient kubernetes.Interface) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking a bit more about this and I think it would be a good idea to have the EndpointSlices functionality be opt-in. This has a few advantages:
- it seems safer to let users opt-in when they know it has been enabled on their cluster rather than us trying to guess based on the existence of an EndpointSlices object
- it allows to us to release this sooner since it is explicitly opt-in and experimental
- we can change the default behavior in the future (perhaps once kube-proxy uses EndpointSlices by default)
- If we discover issues, this gives us a mechanism to fall back to Endpoints mode.
In practice, I see this as being an install flag (and helm variable) that enables EndpointSlices mode. We should also include some validation that prevents EndpointSlices mode from being enabled on a cluster without the EndpointSlices API resource type.
In the interest of getting this merged and rolling forward, I think the safest thing to do would be to comment out the body of the EndpointSliceAccess function and hard-code it to return false, effectively making all of the EndpointSlice logic unreachable. Then, in a separate PR, we can add the install flag which enables EndpointsSlice mode. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adleong it makes sense, and I think it's a good decision. I have been doubting the use of the access function lately, especially after I've read that in K8s v1.8 EndpointSlices
are going to be switched on by default (but not in-use by kube-proxy).
Realistically, as much as I have tried to be thorough with both the unit tests and the manual tests, this needs to be played around with for a while so we can see where unconventional edge cases might be; I'm all for hardcoding the function to return a false and leaving it out for now.
Will make the change and maybe open up an issue in the next few days to get the ball rolling on how to enable them. So far I like the idea of an install flag.
Edit: will also be commenting out some test cases specific to the slices, fyi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along really nicely !
if !ok { | ||
ew.log.Errorf("error processing EndpointSlice resource, got %#v expected *discovery.EndpointSlice", oldObj) | ||
return | ||
} | ||
newSlice, ok := newObj.(*discovery.EndpointSlice) | ||
if !ok { | ||
ew.log.Errorf("error processing EndpointSlice resource, got %#v expected *discovery.EndpointSlice", oldObj) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can refactor that logic in a separate method since you are repeating it here and in add as well. Also you are formatting with %#v
while using %T
in the add method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I looked at this based on your suggestion, but I'm not sure how well it can be extracted out into a separate function without losing context or having the same amount of lines in. I've left some thoughts below :)
- If we extract out just the type casting, we would not save much in terms of space since the new function would still have to return an error or a bool in case the obj is not a slice; if we would not do that we can't exit early from the handler method.
- Although add and update work very similarily, for
add
we usegetOrNewServicePublisher
, forupdate
we usegetServicePublisher
. This subtle difference makes it harder to extract the code. - A third alternative is to create a function which returns a
*discovery.EndpointSlice
resource and aServiceID
, this is the approach I took. If the function does not fit well, I can take it out, however.
Let me know what you think. Also, completely missed the %T
, thought I changed all of them to %#v
:o thanks for the spot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case, it makes sense to leave it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will roll back the changes in this case
|
||
id, err := getEndpointSliceServiceID(newSlice) | ||
if err != nil { | ||
ew.log.Errorf("Could not fetch service name for EndpointSlice [%s/%s]:%v", newSlice.Namespace, newSlice.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure there is a need to chain errors like that. Why not simply format the error you return from getEndpointSliceServiceID
Seems like if you include the slice name and namespace it would give us all the info we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great suggestion! I have made the changes. In addition to moving the ns/name
identifiers in getEndpointSliceServiceID
, I have also modified isValidSlice
to return a boolean instead of an error to cut down on error message chaining
Thank you, let me know if you see them (the error messages) done/structured in a different way and will make the changes :)
} | ||
|
||
func (pp *portPublisher) resolveESTargetPort(slicePorts []discovery.EndpointPort) Port { | ||
if slicePorts == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about making that a bit more explicit. You can return a nil or an error instead of treating Port(0)
as a special value. Is there any reason why you chose this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Actually, the Port is an unsigned 32-bit integer, so we can't return a nil
. I still think an error might be too much although it would make it more explicit. I think in this case I'd prefer to keep Port(0)
, since it's at the end of the method too, I think it makes contextual sense that it is returned when the actual port cannot be resolved. Thoughts @zaharidichev?
Nothing that I can think of other than I thought it might be consistent. When there isn't a defined requested port, we return Port(0)
, so thought it would add the context we need. I think nil
is a good idea though! Error might be a bit too much in this instance, I would rather return a nil since this function is used just once in the whole file when we translate the slice addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think simply returning nil makes sense here. Error is too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the type though it will have to return Port(0)
. The underlying data type for Port
is uint32
, so we can't have a nil return type, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a more sane programming language this would return an Option[Port]
, but alas. Using Port(0)
as a special zero value that indicates that the port does not exist seems like a reasonably go-idiomatic thing to do.
Based on the [EndpointSlice PR](#4663), this is just the k8s/api support for endpointslices to shorten the first PR. * Adds CRD * Adds functions that check whether the cluster has EndpointSlice access * Adds discovery & endpointslice informers to api. Signed-off-by: Matei David <[email protected]>
Now that #4696 has merged, a merge or rebase of main should significantly shrink this diff. |
Rebased and pushed, only three files in the diff now! Thank you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing on that one using the endpoint slices feature and it works very well! Just for reference in case someone else wants to test that, I created a kind cluster using the following config:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
EndpointSlice: true
EndpointSliceProxying: true
ServiceTopology: true
Also changed the EndpointSliceAccess
to return true
. I also used the dst client to look at the updates that the dst service is emitting. Tested a few scenarios with both endpoints and endpoint slices to verify the functionality. There is an interesting thing I noticed. I had a service that was scaled to three replicas. When I run kubectl scale --replicas=1 deploy/backend
to scale it back to one, I noticed the following output:
INFO[0114] Remove:
INFO[0114] - 10.244.0.65:8888
INFO[0114]
INFO[0114] Add:
INFO[0114] labels: map[namespace:default service:backend-svc]
INFO[0114] - 10.244.0.64:8888
INFO[0114] - labels: map[control_plane_ns:linkerd deployment:backend pod:backend-ffd766567-rvp9m pod_template_hash:ffd766567 serviceaccount:default]
INFO[0114] - protocol hint: H2
INFO[0114]
INFO[0114] Remove:
INFO[0114] - 10.244.0.64:8888
INFO[0114]
Seems to me there is an extra ADD being emitted right before a REMOVE for it is fired... I think I know what the problem is and it has to do with the diffAddresses
function and the way we are checking the resource version to see whether the pods are "the same". I think k8s is putting the pod in some terminating state and thereby updating its resource version. Ultimatelly we cna be a bit smarter about that but I am not sure having a duplicate ADD is such a problem. @adleong @Matei207 WDYT? Note that this happens with both endpoints and endpoints slices so it is not result of this PR and does not need to be addressed here.
If it is not introduced by this PR, I would be more than happy to have a look at it and see what can be done after the service topologies RFC is wrapped up! Let me know what you think the best approach would be in this case :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! And with it being off by default, I think this is safe to merge.
Introduce support for the EndpointSlice k8s resource (k8s v1.16+) in the destination service. Through this PR, in the EndpointsWatcher, there will be a dedicated informer for EndpointSlice; the informer cannot run at the same time as the Endpoints resource informer. The main difference is that EndpointSlices have a one-to-many relationship with a service, they provide better performance benefits, dual-stack addresses and more. EndpointSlice support also implies service topology and other k8s related features. Validated and tested manually, as well as with dedicated unit tests. Closes #4501 Signed-off-by: Matei David <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Looking forward to more!
* Small PR that uncomments the `EndpointSliceAcess` method and cleans up left over todos in the destination service. * Based on the past three PRs related to `EndpointSlices` (#4663 #4696 #4740); they should now be functional (albeit prone to bugs) and ready to use. Signed-off-by: Matei David <[email protected]>
Issue: linkerd/linkerd2#4501
PoC: PR#4563
What
EndpointSlices
the K8s resource succeedingEndpoints
Endpoints
withEndpointSlices
on theEndpointsWatcher
side of discovery serviceHow
k8s/api.go
, as well as a method that checks whether the cluster has the EndpointSlice resource registered. (EndpointSlice sits behind a feature gate)Endpoints
resource by introducing type assertions.portPublisher: AddressSet
).EndpointSlices
can be orphaned, but only through human error. Whenever an EndpointSlice is created by K8s it will either have a label with the service name or an owner reference to that service (or both).EndpointSlices
have aReady
condition for their endpoints, endpoints that are not ready are not processed.EndpointSlices
can have an addressType of: IPv4, IPv6 and host/FQDN. After the feedback on the poc I've decided to only allow IPv4 endpoints to be processed at this time, dual-stack requires a bit more diligence and experimentation.EndpointSlices
as a type contains many pointers (e.g Port numbers as pointers).EndpointSlices
have a generated name based on the associated service, to get a service ID we need to look at the label OR owner reference.Feedback and open questions:
In the spirit of learning and improving the quality of this PR, I would very much appreciate feedback related to the code quality (and quality of tests) as well as feedback on the performance of the code. Because of the differences between the two resources, you'll notice a lot of loops spawned, if there are any corners I can cut, let me know. I am also interested to see if there are any architectural decisions that can be improved.
I do have some open questions as well:
Thank you for going through this and be as critical as possible, please!