-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPM] [Linux] another race condition when editing a NetPol or deleting then readding it #2977
Comments
This was referenced Aug 29, 2024
4 tasks
huntergregory
added a commit
that referenced
this issue
Sep 4, 2024
Signed-off-by: Hunter Gregory <[email protected]>
huntergregory
added a commit
that referenced
this issue
Sep 4, 2024
Signed-off-by: Hunter Gregory <[email protected]>
This issue can actually occur for NetPol without CIDR rule. Most likely to occur for NetPol with CIDR rule or MatchExpressions. Yet could happen for other IPSets if they are not associated with any Pod IP. |
This can also occur when editing the NetPol, not just deleting then readding the NetPol |
Pending fix for NetPols without CIDR rules |
4 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Sep 11, 2024
* fix: [Linux] [NPM] handle #2977 for netpols without cidrs Signed-off-by: Hunter Gregory <[email protected]> * fix: lock and no need to track policy key Signed-off-by: Hunter Gregory <[email protected]> * style: remove dead code Signed-off-by: Hunter Gregory <[email protected]> --------- Signed-off-by: Hunter Gregory <[email protected]>
huntergregory
added a commit
that referenced
this issue
Sep 11, 2024
…2990) * fix: [Linux] [NPM] handle #2977 for netpols without cidrs Signed-off-by: Hunter Gregory <[email protected]> * fix: lock and no need to track policy key Signed-off-by: Hunter Gregory <[email protected]> * style: remove dead code Signed-off-by: Hunter Gregory <[email protected]> --------- Signed-off-by: Hunter Gregory <[email protected]>
4 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Sep 12, 2024
…3006) [backport] fix: [Linux] [NPM] handle #2977 for netpols without cidrs (#2990) * fix: [Linux] [NPM] handle #2977 for netpols without cidrs * fix: lock and no need to track policy key * style: remove dead code --------- Signed-off-by: Hunter Gregory <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
NOTE: v1.5.37 will fix completely. In v1.5.36, this issue is fixed only for NetPols with CIDR rules
Similar issue: #2963
EDIT (9/5): based on #2977 (comment)
EDIT (9/9): based on #2977 (comment)
Summary
Under this race condition, NPM applies a single NetworkPolicy incorrectly.
This race can occur when editing a NetworkPolicy with "enough" rules or when deleting such a NetPol then readding it before any other changes occur to Pods, Namespaces, or NetworkPolicies in the cluster. This race condition occurs when the kernel is slow to process changes.
Symptoms
Unexpected connectivity after editing a NetPol, or after removing a NetPol then readding the NetworkPolicy.
NPM logs include the following lines in succession:
Mitigation
Try deleting and readding the NetworkPolicy again. If that does not work, restart all NPM Pods.
To Avoid the Issue
Cause
When removing a NetPol, NPM deletes the Policy firewall rules, then flushes the IPs from the policy's CIDR and MatchExpression IPSets, then tries to delete those IPSets as well as any other policy IPSets which are empty (no associated Pod IPs). Sometimes the kernel still thinks that firewall rules reference the IPsets (hence the log line
Set cannot be destroyed: it is in use by a kernel component
), so after 5 attempts, NPM will fail to delete the IPSets. The previous "slow kernel" issue is temporary. NPM will not retry the IPSet operations until the next change in Pods, Namespaces, or NetworkPolicies. If the next change is adding back the same NetworkPolicy, then NPM may apply the NetworkPolicy incorrectly leading to unexpected connectivity. If the next change is anything else, then the IPSets will be properly deleted, and no issue occurs.Why? Because the NetworkPolicy might use the same IPSets, and NPM thinks those IPSets still exist in the kernel as they did before (with all IPs included); however, NPM has already flushed all IPs from the IPSets, so NPM might not add back IPs that it should.
For instance, if the NetworkPolicy is added back with no edits, then NPM will think that every CIDR IPSet is properly configured in the kernel. However, all of the IPSets will be missing their IPs. Therefore, no IPs will be allowed by the policy's firewall rules.
Logs
The text was updated successfully, but these errors were encountered: