-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid CmdAdd rollback in antrea-agent #5547
Labels
kind/bug
Categorizes issue or PR as related to a bug.
reported-by/end-user
Issues reported by end users.
Milestone
Comments
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Oct 5, 2023
* When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes antrea-io#5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Oct 9, 2023
* When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes antrea-io#5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
that referenced
this issue
Oct 9, 2023
* When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes #5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Oct 9, 2023
* When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes antrea-io#5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Oct 9, 2023
* When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes antrea-io#5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
to antoninbas/antrea
that referenced
this issue
Oct 9, 2023
* When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes antrea-io#5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
that referenced
this issue
Oct 10, 2023
…failure in CNI (#5558) * When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes #5547 Signed-off-by: Antonin Bas <[email protected]>
antoninbas
added a commit
that referenced
this issue
Oct 11, 2023
…failure in CNI (#5559) * When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes #5547 Signed-off-by: Antonin Bas <[email protected]>
tnqn
pushed a commit
that referenced
this issue
Oct 16, 2023
…failure in CNI server (#5560) * When performing configuration rollback after an error in CmdAdd, we do not invoke CmdDel directly. Instead, we invoke an internal version of it which does not log a "Received CmdDel request" message (the message is confusing otherwise as it implies that we received a new CNI DEL command from the container runtime), and which does not process the network config again (as it was already processed at the beginning of CmdAdd). By not processing the config a second time, we ensure that there are no duplicate CIDRs in the IPAMConfig. * Migrate klog calls in server.go to use structured logging. * Improve unit tests for the CNI server to validate this fix. Fixes #5547 Signed-off-by: Antonin Bas <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
Categorizes issue or PR as related to a bug.
reported-by/end-user
Issues reported by end users.
Describe the bug
This was observed in antrea-agent logs:
The sequence of events is as follows:
Ranges
slice in the network config1, causing a duplicate and an error in host-local ipam2.To Reproduce
Not straightforward: one needs to create a scenario in which the CmdAdd will fail (e.g. by creating enough Pods on a Node to run out of local IPs).
Expected
The rollback should be able to succeed. While this doesn't have a big impact on the system, it can make the logs confusing and harder to understand when troubleshooting an iisue.
Actual behavior
The rollback fails and an error is logged, which can be confusing for users (
range set 0 overlaps with 1
)Versions:
Footnotes
https://github.com/antrea-io/antrea/blob/bbec9d04dd426bdcf1a7af0024ac93fdc61debda/pkg/agent/cniserver/server.go#L264-L276. ↩
https://github.com/containernetworking/plugins/blob/f95505231a8209b1b41eac2d5c2399c6ab444785/plugins/ipam/host-local/backend/allocator/config.go#L157-L165 ↩
The text was updated successfully, but these errors were encountered: