You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have seen instances of linkerd routing traffic from one node to another for services whose traffic policy is set to local.
This appears to happen when a node is being rebooted
How can it be reproduced?
I was able to reproduce by creating a daemonset and a service with internalTrafficPolicy: Local on a multi-node cluster.
By using the destination-client to debug I found, when a pod is manually deleted, I saw the endpoints were updated two times: Add and Remove.
The endpoints for remote pods got added first when the pod is being terminated and then got removed when the new pod is running.
This is causing issues if the request is sent before the remote endpoints get removed. So it is easier to reproduce if the node is being restarted as the gap is longer.
Logs, error output, etc
N/A
output of linkerd check -o short
❯ linkerd check -o short
Status check results are √
Environment
Linkerd 2.14.10
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered:
Fixes: #12311
When the endpoint translator receives a `remove` call, it was updating it's local traffic policy based on the address set passed to remove. However, since `remove` is only meant to remove addresses and not change the address metadata, the endpoints watcher was not setting local traffic policy on these calls to `remove`. This can result in calls to `remove` temporarily turning off local traffic policy which will cause non-local addresses to be sent to clients.
Since `remove` should not change address metadata, we now disregard any metadata in the call to `remove`, including any changes to the local traffic policy.
Signed-off-by: Alex Leong <[email protected]>
Co-authored-by: Oliver Gould <[email protected]>
What is the issue?
I have seen instances of linkerd routing traffic from one node to another for services whose traffic policy is set to local.
This appears to happen when a node is being rebooted
How can it be reproduced?
I was able to reproduce by creating a daemonset and a service with
internalTrafficPolicy: Local
on a multi-node cluster.By using the
destination-client
to debug I found, when a pod is manually deleted, I saw the endpoints were updated two times:Add
andRemove
.The endpoints for remote pods got added first when the pod is being terminated and then got removed when the new pod is running.
This is causing issues if the request is sent before the remote endpoints get removed. So it is easier to reproduce if the node is being restarted as the gap is longer.
Logs, error output, etc
N/A
output of
linkerd check -o short
Environment
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: