Increase NetworkPolicy PacketIn rate limit, and make it configurable #5358
Labels
area/monitoring/auditing
Issues or PRs related to auditing.
area/network-policy
Issues or PRs related to network policies.
area/ovs/openflow
Issues or PRs related to Open vSwitch Open Flow.
kind/feature
Categorizes issue or PR as related to a new feature.
priority/backlog
Higher priority than priority/awaiting-more-evidence.
Describe the problem/challenge you have
The PacketIn messages generated by NetworkPolicy tables are rate-limited using an OVS meter:
antrea/pkg/agent/openflow/packetin.go
Lines 79 to 86 in 4746217
These messages are used by Audit Logging and by the Flow Exporter.
This was added to prevent DDos attacks, as it would be very easy for an attacker to generate a large amount of packets (e.g. UDP packets) that would need to be sent to the Agent. This could cause high CPU usage.
By dropping packets in the datapath, we can protect the Agent against this.
However, it also means that we can lose a lot of events in case of legitimate, bursty traffic.
Describe the solution you'd like
The 100 pps limit may be a bit too conservative. I believe that there would be little risk in increasing the limit to 1,000 pps. Of course, this needs to be tested manually. Ideally, we would run some experiments with different values, and plot CPU usage against the rate limit value.
Additionally, while this is not a parameter that most users would want to modify, it could make sense to make this configurable (part of the Agent config), so it can be tweaked in the field. It would also make testing different values much easier.
Anything else you would like to add?
Traceflow packets are also rate-limited, but I don't see much value in changing the limit or making it configurable.
There is another level of rate-limiting (inside the Agent):
antrea/pkg/agent/openflow/packetin.go
Lines 88 to 93 in 4746217
It also needs to be adjusted accordingly.
The text was updated successfully, but these errors were encountered: