-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear conntrack entries for externalIP and LoadBalancer IP #73323
Clear conntrack entries for externalIP and LoadBalancer IP #73323
Conversation
When an endpoint is deleted, the conntrack entries are cleared for clusterIP but not for externalIP of the service. This change adds that step.
/assign @freehan |
Thanks @prameshj for the fix. I can try to test locally if you have a docker image pushed somewhere. Also, the fixes statement in the description has an extra slash, so referenced ticket won't get closed automatically. |
@@ -613,6 +613,12 @@ func (proxier *Proxier) deleteEndpointConnections(connectionMap []proxy.ServiceE | |||
if err != nil { | |||
klog.Errorf("Failed to delete %s endpoint connections, error: %v", epSvcPair.ServicePortName.String(), err) | |||
} | |||
for _, extIP := range svcInfo.ExternalIPStrings() { | |||
err := conntrack.ClearEntriesForNAT(proxier.exec, extIP, endpointIP, v1.ProtocolUDP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to indiscriminantly remove all sessions matching external IP to endpointIP regardless if it's stale or not. Can this be written to be more selective especially if we know the port(s) involved in the stale session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a check above(L608) to delete stale connections based on the NodePort. If user is not using NodePort and has specified TargetPort by name, that could have a different value for each endPoint. ServiceEndpoint currently does not expose this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I repeatedly tested our apps using latest test build with your fix. I deleted the pods with UDP services in order to reproduce/cause stale sessions and verified that the stale sessions were removed immediately. I also verified that new UDP sessions and traffic were able to be created to the new pod instances.
In doing so I realized that my concerns about inadvertent removal of sessions for matching extIP, endpointIP, and of UDP type doesn't apply for the reason that pod instance endpointIP are unique and in deleting the instance, it's desirable to clear all sessions related to this endpointIP to prevent routing black holes. I'm able to verify that your code accomplishes this by clearing sessions from all possible extIPs to the removed pod instance endpointIP.
Thanks for verifying! There could still be a conflict if someone is using 2 different UDP-based services with the same pods as endpoints right? But since this endpoint is stale anyway, it should be ok to delete those connections as well. |
It would be great this PR can also cover the IP from LoadbalancerStatus. Just need to mirror the logic for that case |
Need to update PR title & description to reflect LB change. Also, is there a relevant ticket for LB @freehan ? |
|
If you're asking me, I think it's fine to reuse ticket. I just wasn't sure if there was an existing issue for the LB IP that we can also link to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/assign @thockin |
I do not think there is an existing issue for LB IP. But I would imagine the same problem will hit UDP LBs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: freehan, prameshj, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
…23-upstream-release-1.13 Automated cherry pick of #73323: Clear conntrack entries for externalIP
When an endpoint is deleted, the conntrack entries are cleared for
clusterIP but not for externalIP of the service. This change adds
that step.
What type of PR is this?
/kind bug
What this PR does / why we need it:
When an endpoint is deleted, the conntrack entries are cleared for clusterIP but not for externalIP of the service spec and LoadBalancer ip from service status. This PR adds that step.
Which issue(s) this PR fixes:
Fixes #65228
Special notes for your reviewer:
Does this PR introduce a user-facing change?: