Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd does not respect local traffic policy under certain circumstance #12311

Closed
yc185050 opened this issue Mar 21, 2024 · 1 comment · Fixed by #12325
Closed

Linkerd does not respect local traffic policy under certain circumstance #12311

yc185050 opened this issue Mar 21, 2024 · 1 comment · Fixed by #12325

Comments

@yc185050
Copy link
Contributor

What is the issue?

I have seen instances of linkerd routing traffic from one node to another for services whose traffic policy is set to local.
This appears to happen when a node is being rebooted

How can it be reproduced?

I was able to reproduce by creating a daemonset and a service with internalTrafficPolicy: Local on a multi-node cluster.

By using the destination-client to debug I found, when a pod is manually deleted, I saw the endpoints were updated two times: Add and Remove.

INFO[0038]                                              
INFO[0108] Add:                                         
INFO[0108] labels: map[]                                
INFO[0108] - 10.244.0.8:9376                            
INFO[0108]   - labels: map[control_plane_ns:linkerd daemonset:fluentd-elasticsearch pod:fluentd-elasticsearch-6mjjz serviceaccount:default zone:] 
INFO[0108]   - protocol hint: H2                        
INFO[0108]   - identity: dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}  server_name:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"} 
INFO[0108] - 10.244.2.7:9376                            
INFO[0108]   - labels: map[control_plane_ns:linkerd daemonset:fluentd-elasticsearch pod:fluentd-elasticsearch-2c59n serviceaccount:default zone:] 
INFO[0108]   - protocol hint: H2                        
INFO[0108]   - identity: dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}  server_name:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"} 
INFO[0108]                                              
INFO[0115] Remove:                                      
INFO[0115] - 10.244.0.8:9376                            
INFO[0115] - 10.244.2.7:9376                            
INFO[0115]   

The endpoints for remote pods got added first when the pod is being terminated and then got removed when the new pod is running.

This is causing issues if the request is sent before the remote endpoints get removed. So it is easier to reproduce if the node is being restarted as the gap is longer.

Logs, error output, etc

N/A

output of linkerd check -o short

❯ linkerd check -o short
Status check results are √

Environment

  • Linkerd 2.14.10

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

@yc185050
Copy link
Contributor Author

#12312 fixes this issue. Is it possible to back port this to 2.15 and 2.14?

adleong added a commit that referenced this issue Mar 22, 2024
Fixes: #12311

When the endpoint translator receives a `remove` call, it was updating it's local traffic policy based on the address set passed to remove.  However, since `remove` is only meant to remove addresses and not change the address metadata, the endpoints watcher was not setting local traffic policy on these calls to `remove`.  This can result in calls to `remove` temporarily turning off local traffic policy which will cause non-local addresses to be sent to clients.

Since `remove` should not change address metadata, we now disregard any metadata in the call to `remove`, including any changes to the local traffic policy.

Signed-off-by: Alex Leong <[email protected]>
Co-authored-by: Oliver Gould <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants