-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPL Controller shuts down if iptables-restore operation fails #2554
Labels
kind/bug
Categorizes issue or PR as related to a bug.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Comments
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Aug 6, 2021
Add a retry mechanism in the Controller initialization, which will keep trying to sync iptables rules until the operation is successful. On success, the NPL Controller is notified through a channel and can start its event handlers. Fixes antrea-io#2554 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Aug 10, 2021
Add a retry mechanism in the Controller initialization, which will keep trying to sync iptables rules until the operation is successful. On success, the NPL Controller is notified through a channel and can start its event handlers. Fixes antrea-io#2554 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
that referenced
this issue
Aug 11, 2021
Add a retry mechanism in the Controller initialization, which will keep trying to sync iptables rules until the operation is successful. On success, the NPL Controller is notified through a channel and can start its event handlers. Fixes #2554 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Aug 11, 2021
Add a retry mechanism in the Controller initialization, which will keep trying to sync iptables rules until the operation is successful. On success, the NPL Controller is notified through a channel and can start its event handlers. Fixes antrea-io#2554 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
that referenced
this issue
Aug 11, 2021
…ectly in NPL (#2575) Add a retry mechanism in the Controller initialization, which will keep trying to sync iptables rules until the operation is successful. On success, the NPL Controller is notified through a channel and can start its event handlers. Fixes #2554 Signed-off-by: Antonin Bas <[email protected]>
annakhm
pushed a commit
to annakhm/antrea
that referenced
this issue
Aug 16, 2021
…io#2555) Add a retry mechanism in the Controller initialization, which will keep trying to sync iptables rules until the operation is successful. On success, the NPL Controller is notified through a channel and can start its event handlers. Fixes antrea-io#2554 Signed-off-by: Antonin Bas <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
Categorizes issue or PR as related to a bug.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Describe the bug
Thanks @alokmaurya88 for reporting this issue.
When restarting the Antrea Agent in a large scale cluster, the iptables-restore operation used to restore all the DNAT rules previously installed by the NPL Controller (and saved as Pod installations) may fail because of contention. In this case, the following is observed in the Antrea Agent logs:
To Reproduce
Requires some contention with other components / processes that need access to iptables, e.g. kube-proxy. The lock needs to be held by someone else for a long time (10s) for it to fail, so in the case of kube-proxy it would require a large number of Services / Endpoints.
Expected
In case of contention, the NPL Controller should either:
I plan to implement 1) as part of a bug fix patch. 2) can be considered in the future.
Actual behavior
In this scenario, the NPL Controller shuts down and will not be restarted unless the Antrea Agent is restarted itself. This is not acceptable as the desired NPL behavior is not realized.
Versions:
Antrea version: v1.2.0, v1.2.1, main
The text was updated successfully, but these errors were encountered: