KEP-3866 kube-proxy nftables to beta #4663

danwinship · 2024-05-24T20:55:18Z

One-line PR description: move KEP-3866 kube-proxy nftables to beta

Issue link: nftables kube-proxy backend #3866

Other comments:

Started this to try and nail down what was left to be done. Still some updates pending.

k8s-ci-robot · 2024-05-24T20:55:21Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

aojea · 2024-05-24T21:51:06Z

keps/sig-network/3866-nftables-proxy/README.md

+job, but this is because it doesn't run with `minSyncPeriod: 10s` like
+the `iptables` job does, and so it syncs rule changes more often.
+(However, the that it's able to do so without the cluster falling over
+is a strong indication that it _is_ more efficient.)


we have some proof that handles large scale more efficiently https://gist.github.com/aojea/f9ca1a51e2afd03621744c95bfdab5b8

wojtek-t · 2024-06-07T12:45:22Z

keps/prod-readiness/sig-network/3866.yaml

@@ -4,3 +4,5 @@
 kep-number: 3866
 alpha:
  approver: "@wojtek-t"
+beta:


@danwinship - are you planning to get that into 1.31?

yes... was hoping to get the mode-switching e2e job implemented, but it's not yet. But we're otherwise mostly OK on the graduation criteria. (I need to push a small update...)

BenTheElder

LGTM for Beta
[PRR Shadow]

BenTheElder · 2024-06-07T21:15:10Z

keps/sig-network/3866-nftables-proxy/README.md

-  `InClusterNetworkLatency`, but no one is really looking at the
-  results yet and they may need work before they are useable.
+We have an [nftables scalability job]. Initial performance is fine; we
+have not done a lot of further testing/improvement yet.


But we do have evidence that it is more performant and should scale fine? I suspect this undersells it a bit :-)

For GA we should probably take a look at how the 5,000 node jobs do.

We do have some numbers but I don't have them handy and wanted to push this sooner rather than later...

For GA we should probably take a look at how the 5,000 node jobs do.

Kube-proxy performance does not directly scale in any way with node count. If the 5000 node job creates more services than the 100 node job, then that affects the metrics, but you don't actually need a large cluster to test large numbers of services; from kube-proxy's perspective, having 10,000 Services that all point to the same 3 Pods is exactly the same as having 10,000 Services that point to 30,000 unique pods.

@npinaeva has been doing some tests of that sort, and at the moment nftables kube-proxy's performance with many many services is much closer to old pre-partial-sync iptables kube-proxy's performance than current iptables kube-proxy's performance, but well, we were waiting to have better perf numbers before we got too heavily into optimization.

What dimension of performance are we talking here, network programming latency is the main pain point today, dataplane performance as the linear search evaluation was always the main complain, but it seems that is more a synthetic problem

She was testing dataplane performance but ran into the programming latency performance problem while trying to test that. It's not clear yet exactly what aspect of the test data was making it slow, other than that the slowness was in the actual reload operation in the kernel, not in the userspace parsing, etc. (In theory, nft should not be slower than iptables, given that modern iptables is just translating to nftables underneath. Though it's possible that, e.g., adding an element to a set is unexpectedly slow, which is something that would affect our nftables performance but not our iptables-via-iptables-nft performance.)

For dataplane performance there wasn't any observable improvement at the size she was testing, which is kind of expected because we hadn't ever had any complaints at the sizes kube-proxy could already support well. (In particular, pre-MinimizeIPTablesRestore, it was basically impossible to have so many services that the linear search in the nat table would affect dataplane performance.)

In the filter table, the limits are much smaller, and we've hit them twice (kubernetes/kubernetes#56164 / kubernetes/kubernetes#95252), but it's not a "things gradually get slower" problem, it's an "everything works fine until suddenly the entire network is completely broken" problem. We should try to reproduce that and confirm that nftables doesn't hit the same problem.

BenTheElder · 2024-06-07T21:16:44Z

keps/sig-network/3866-nftables-proxy/README.md

+users to be aware of whether they are depending on features that work
+differently in the `nftables` backend, to help users decide whether
+they can migrate to `nftables`, and whether they need any non-standard
+configuration in order to do so.


We should outline this somewhere, perhaps a feature blogpost?

Yes, blog post sounds good

BenTheElder · 2024-06-07T21:17:36Z

keps/sig-network/3866-nftables-proxy/README.md

+We do not currently have any apples-to-apples comparisons; the
+`nftables` perf job uses more CPU than the corresponding `iptables`
+job, but this is because it doesn't run with `minSyncPeriod: 10s` like
+the `iptables` job does, and so it syncs rule changes more often.


But we could test this with the same config? Why not?

Because the config used by the iptables test is objectively incorrect. 😝

I suppose I should try to get that fixed

https://kubernetes.io/docs/reference/networking/virtual-ips/#minimize-iptables-restore

thockin · 2024-06-07T23:56:10Z

/
Thanks!

/lgtm
/approve

wojtek-t

@danwinship - just some minor comments from me from the PRR perspective. PTAL

wojtek-t · 2024-06-10T08:29:21Z

keps/sig-network/3866-nftables-proxy/README.md

@@ -1536,6 +1496,10 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
 - We have at least the start of a plan for the next steps (changing
  the default mode, deprecating the old backends, etc).

+- No UNRESOLVED sections in the KEP. (In particular, we have figured


About Beta graduation criteria:

The nftables mode has seen at least a bit of real-world usage.

Do we have that? Can you elaborate a bit on that in the KEP?

wojtek-t · 2024-06-10T08:35:20Z

keps/sig-network/3866-nftables-proxy/README.md

-<!--
-Even if applying deprecation policies, they may still surprise some users.
-->
+The new backend is not 100% compatible with the `iptables` backend.


Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

I'm fine with the test not being executed now, but can you briefly describe the test that will be run prior beta graduation? [and update the details based on results later (probably after freeze)]?

this was answered in the "feature enablement/disablement" section but I'll mention it here too

wojtek-t · 2024-06-10T08:37:22Z

keps/sig-network/3866-nftables-proxy/README.md

@@ -1720,15 +1679,15 @@ question.

 ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

-TBD.
+`sync_proxy_rules_nftables_sync_failures_total` indicates the number


please fill in the SLO question above.

[I would be fine with just saying that for now we're sticking to the network programming latency (https://github.com/kubernetes/community/blob/master/sig-scalability/slos/network_programming_latency.md ) and we will reevaluate that before GA.

wojtek-t · 2024-06-11T11:22:14Z

/lgtm
/approve PRR

k8s-ci-robot · 2024-06-11T11:22:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-network/OWNERS~~ [danwinship,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

brandond · 2024-11-07T20:08:47Z

@danwinship @wojtek-t shouldn't the new mode be added to the CLI flag help text now that it's on by default?
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-proxy/app/options.go#L126

Since it's not mentioned in the CLI flag help, there is no coverage of nftables as a valid proxy-mode on Linux in the docs at https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/

danwinship · 2024-11-08T13:58:20Z

yup, filed kubernetes/kubernetes#128698

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 24, 2024

k8s-ci-robot requested review from palnabarun and thockin May 24, 2024 20:55

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 24, 2024

aojea reviewed May 24, 2024

View reviewed changes

danwinship mentioned this pull request May 25, 2024

nftables kube-proxy backend #3866

Open

10 tasks

wojtek-t self-assigned this May 27, 2024

wojtek-t reviewed Jun 7, 2024

View reviewed changes

danwinship marked this pull request as ready for review June 7, 2024 13:20

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2024

k8s-ci-robot requested a review from jeremyrickard June 7, 2024 13:20

danwinship force-pushed the nftables-proxy-to-beta branch 2 times, most recently from 8c2633b to 04efa74 Compare June 7, 2024 15:33

BenTheElder self-assigned this Jun 7, 2024

BenTheElder reviewed Jun 7, 2024

View reviewed changes

k8s-ci-robot assigned thockin Jun 7, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2024

wojtek-t reviewed Jun 10, 2024

View reviewed changes

KEP-3866 kube-proxy nftables to beta

c386ddb

danwinship force-pushed the nftables-proxy-to-beta branch from 04efa74 to c386ddb Compare June 11, 2024 11:10

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2024

k8s-ci-robot merged commit c7c9de8 into kubernetes:master Jun 11, 2024
4 checks passed

k8s-ci-robot added this to the v1.31 milestone Jun 11, 2024

danwinship deleted the nftables-proxy-to-beta branch June 12, 2024 06:57

brandond mentioned this pull request Nov 7, 2024

Add nft to k3s-root userspace bundle k3s-io/k3s#11267

Closed

danwinship mentioned this pull request Nov 8, 2024

Document the existence of nftables as a kube-proxy mode. kubernetes/kubernetes#128698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3866 kube-proxy nftables to beta #4663

KEP-3866 kube-proxy nftables to beta #4663

danwinship commented May 24, 2024

k8s-ci-robot commented May 24, 2024

aojea May 24, 2024

wojtek-t Jun 7, 2024

danwinship Jun 7, 2024

BenTheElder left a comment

BenTheElder Jun 7, 2024

BenTheElder Jun 7, 2024

danwinship Jun 7, 2024 •

edited

Loading

aojea Jun 8, 2024

danwinship Jun 8, 2024

BenTheElder Jun 7, 2024

danwinship Jun 7, 2024

BenTheElder Jun 7, 2024

danwinship Jun 7, 2024

danwinship Jun 7, 2024

thockin commented Jun 7, 2024

wojtek-t left a comment

wojtek-t Jun 10, 2024

wojtek-t Jun 10, 2024

danwinship Jun 11, 2024

wojtek-t Jun 10, 2024

wojtek-t commented Jun 11, 2024

k8s-ci-robot commented Jun 11, 2024

brandond commented Nov 7, 2024 •

edited

Loading

danwinship commented Nov 8, 2024

KEP-3866 kube-proxy nftables to beta #4663

KEP-3866 kube-proxy nftables to beta #4663

Conversation

danwinship commented May 24, 2024

k8s-ci-robot commented May 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenTheElder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Jun 7, 2024

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Jun 11, 2024

k8s-ci-robot commented Jun 11, 2024

brandond commented Nov 7, 2024 • edited Loading

danwinship commented Nov 8, 2024

danwinship Jun 7, 2024 •

edited

Loading

brandond commented Nov 7, 2024 •

edited

Loading