-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IP may leak after agent restart #4326
Comments
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
I was taking a look at this old issue and I tried to reproduce the IP leakage.
After a few seconds (less than a minute), the new antrea-agent Pod receives a CmdDel command for the old workload Pod, and the IP is released. It seems that the kubelet garbage collector is taking care of removing the old sandbox. When I increase kubelet log verbosity, I see:
kubelet is able to list all sandboxes from the container runtime, and the container runtime I am using (containerd v1.7.1) will not remove the sandbox until the network has been deleted:
It is possible that some older versions of containerd were not that robust. @tnqn do you think we should have a more defensive posture in Antrea and add logic to proactively release IPs on agent (re)start? |
@antoninbas I have tried the following K8s versions deployed via kind and haven't managed to reproduce IP leak with the steps you described:
So I feel this case may have been taken care of in normal workflow, but it's hard to say it's handled correctly in abnormal case, for example, containerd previously had this issue: containerd/containerd#5569, and I remember you were investigating another leak with newer container versions in the last few weeks which we don't know the root cause yet? And there could be other CRIs which implement the cleanup differently. Therefore, I tend to agree that, instead of relying on kubelet/container runtime to be bug free regarding CNI cleanup, it's more robust to perform a conservative garbage collection on agent start. We should log such cleanup explicitly as we don't actually expect it unless there is a bug in container runtime or antrea. |
The person who reported the issue confirmed that when using a more recent version of containerd, the issue cannot be reproduced anymore (perhaps because of this fix: containerd/containerd#5904). With an older containerd version, they were able to reproduce the issue consistently.
Sounds good to me. I can take a stab at it. |
@tnqn I wrote some PoC code so see if I could come up with a viable approach: fbfa402 However, it feels very much like an anti-pattern: we have to craft the CNI arguments from "scratch" in order to invoke the IPAM plugin. It can work (both when delegating to host-local IPAM or when using AntreaIPAM), but it feels like a hack. The code I wrote uses the Going back to the relevance of this issue, it does feel inconsistent to treat interfaces and IP addresses differently. Either we assume that the container runtime (+ kubelet) will always eventually invoke CNI DEL, in which case that part of the reconciliation logic is not really needed at all. Or we assume that resource leak is possible under some conditions, and we have to take care of all resources consistently. |
I took a look at the issue and the PR, they seem ensuring CNI will be called repeatedly until it returns success. But I remember in our case there was no CNI invocation, so looks different. Do you know which version the issue can be reproduced and which can not? According to the enhancements that have been made in containerd, it does seem CNI cleanup is ensured now.
I was thinking something more hacky, accessing the IPAM dir and deleting IPs that shouldn't exist.. I remember we offered a script for users to do the cleanup when they encounted the IP leak: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh.
Agree. When we added this logic, kubelet and container runtime were not that robust, notwithstanding the logic is not complete. But looking at the related issues in containerd and its fixes, I'm also wondering if it's still worth making it more complex to cover this case. If we want to keep the logic for now, how about just using the simple (but hacky) approach to release known leaked IPs bypassing the IPAM plugin, without adding more cache/storage? It will be very specific to host-local plugin and could be broken if host-local changes its implementation, but it's the only case we have encountered in production, and this is a workaround anway. If IP leak still happens, right direction is still patching and upgrading container runtime. |
Yes that PR seemed like a bit of a stretch, so maybe it was some other patch.
Hmm, I was thinking it would be nice to have a unified approach which also works with AntreaIPAM. Don't you think the same issue can happen there if there is a missing CNI DEL? I.e., the IPPool CR has leaked IPs. |
I remember it has been taken care of by antrea-controller: antrea/pkg/controller/ipam/antrea_ipam_controller.go Lines 162 to 165 in e8f5d93
|
OK, thanks for the link. If AntreaIIPAM has its own GC, then I think it's acceptable to have a solution which is specific to host-local. |
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Intead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes antrea-io#4326 Signed-off-by: Antonin Bas <[email protected]>
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Intead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes antrea-io#4326 Signed-off-by: Antonin Bas <[email protected]>
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Intead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes antrea-io#4326 Signed-off-by: Antonin Bas <[email protected]>
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Intead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes antrea-io#4326 Signed-off-by: Antonin Bas <[email protected]>
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Instead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes #4326 Signed-off-by: Antonin Bas <[email protected]>
Describe the bug
In the existing reconcile logic (), antrea-agent remove OVS ports/ OpenFlow entries / items in the memory interface store if the Pod is deleted when agent is down. But the logic does not call IPAM to recycle the allocated IP. So this may lead to a leak on the IP resources.
To Reproduce
Expected
Actual behavior
Versions:
Antrea: main branch
Additional context
The text was updated successfully, but these errors were encountered: