Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase NetworkPolicy PacketIn rate limit, and make it configurable #5358

Closed
antoninbas opened this issue Aug 4, 2023 · 1 comment
Closed
Labels
area/monitoring/auditing Issues or PRs related to auditing. area/network-policy Issues or PRs related to network policies. area/ovs/openflow Issues or PRs related to Open vSwitch Open Flow. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@antoninbas
Copy link
Contributor

Describe the problem/challenge you have
The PacketIn messages generated by NetworkPolicy tables are rate-limited using an OVS meter:

// We use OpenFlow Meter for packetIn rate limiting on OVS side.
// Meter Entry ID.
PacketInMeterIDNP = 1
PacketInMeterIDTF = 2
// Meter Entry Rate. It is represented as number of events per second.
// Packets which exceed the rate will be dropped.
PacketInMeterRateNP = 100
PacketInMeterRateTF = 100

These messages are used by Audit Logging and by the Flow Exporter.

This was added to prevent DDos attacks, as it would be very easy for an attacker to generate a large amount of packets (e.g. UDP packets) that would need to be sent to the Agent. This could cause high CPU usage.
By dropping packets in the datapath, we can protect the Agent against this.

However, it also means that we can lose a lot of events in case of legitimate, bursty traffic.

Describe the solution you'd like
The 100 pps limit may be a bit too conservative. I believe that there would be little risk in increasing the limit to 1,000 pps. Of course, this needs to be tested manually. Ideally, we would run some experiments with different values, and plot CPU usage against the rate limit value.

Additionally, while this is not a parameter that most users would want to modify, it could make sense to make this configurable (part of the Agent config), so it can be tweaked in the field. It would also make testing different values much easier.

Anything else you would like to add?

Traceflow packets are also rate-limited, but I don't see much value in changing the limit or making it configurable.

There is another level of rate-limiting (inside the Agent):

// PacketInQueueSize defines the size of PacketInQueue.
// When PacketInQueue reaches PacketInQueueSize, new packetIn will be dropped.
PacketInQueueSize = 200
// PacketInQueueRate defines the maximum frequency of getting items from PacketInQueue.
// PacketInQueueRate is represented as number of events per second.
PacketInQueueRate = 100

It also needs to be adjusted accordingly.

@antoninbas antoninbas added kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. area/monitoring/auditing Issues or PRs related to auditing. area/ovs/openflow Issues or PRs related to Open vSwitch Open Flow. area/network-policy Issues or PRs related to network policies. labels Aug 4, 2023
@antoninbas
Copy link
Contributor Author

This was addressed in #5450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring/auditing Issues or PRs related to auditing. area/network-policy Issues or PRs related to network policies. area/ovs/openflow Issues or PRs related to Open vSwitch Open Flow. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

1 participant