KEP: kube-proxy detect "local" traffic w/o cluster CIDR #1354

satyasm · 2019-11-04T19:42:17Z

KEP to update iptables to have alternate ways of detecting cluster originated traffic instead of using the cluster cidr.

Initial KEP to enhance iptables rules so that we don't depend on cluster CIDR as part of the rules, allowing for implementations to have more flexibility on how they manage POD IPs.

k8s-ci-robot · 2019-11-04T19:42:24Z

Welcome @satyasm!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-11-04T19:42:25Z

Hi @satyasm. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

satyasm · 2019-11-04T19:43:34Z

@thockin @caseydavenport @MikeSpreitzer the KEP we briefly referred to during last weeks sig-network meetings. Can you please review?

Needs ok-to-test :-)

Thanks!

satyasm · 2019-11-06T22:20:36Z

ping @thockin , @caseydavenport , @MikeSpreitzer

keps/sig-network/20191104-iptables-no-cluster-cidr.md

MrHohn · 2019-11-14T21:15:52Z

/ok-to-test

caseydavenport

Thanks @satyasm

keps/sig-network/20191104-iptables-no-cluster-cidr.md

Includes clarifications from comment discussions.

satyasm · 2019-11-27T23:56:17Z

Updated KEP with comment feedback and added more details in terms of flags etc in the design details section. Please take a look. Thanks!

satyasm · 2019-11-28T00:02:37Z

/retest

thockin

Overall I like this

keps/sig-network/20191104-iptables-no-cluster-cidr.md

thockin · 2019-12-05T01:18:01Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+  kube-proxy considers traffic as local if originating from a bridge within the node.
+```
+
+Only one of `--cluster-cidr`, `--detect-local-with-node-cidr`, `--detect-local-with-pod-interface` or


I was thinking something like:

--detect-local={cluster-cidr | node-cidr | pod-interface-prefix | bridge} defaulting to cluster-cidr for compat

if cluster-cidr, look at the --cluster-cidr flag

if node-cidr, look at the node PodCIDR field

if interface-prefix, look at the --pod-interface-prefix flag

if bridge, use physdev

This could extend to "cidr", with a list of specific CIDRs, "mark" with a specific mark value/mask, etc. This seems less ambiguous to me, but I am not wedded to it, either.

Thanks for this. I really like this because if we default --detect-local=cluster-cidr, then the current behavior becomes the automatic default when not using any new flags. The only change I would make to this would be to adding an optional --node-cidr flag to go along with the node-cidr choice to handle cases where node CIDR(s) are not tracked as part of the node.podCIDR field. If --node-cidr is not specified, we default to node.podCIDR field.

Will update the kep with the new flag format.

Not to bikeshed, but --detect-reachable, since this isn't really about locality, but reachability?

I put in my perspective on the following comment. I think "--detect-local" does capture the intent better? I don't think it's about reachability.

Have updated this implementation section with the new flags. Can this be resolved?

squeed · 2019-12-05T15:20:35Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+some notion of node local pod traffic.
+
+The core logic in these cases is “how to determine” cluster originated traffic from non-cluster originated ones. 
+The proposal is that tracking pod traffic generated from within the node is sufficient to determine cluster originated 


I think you're making two assumptions here that don't hold for all clusters. The first is that all non-cluster-originated traffic cannot reach the cluster - this is often false, especially for Calico (cc @caseydavenport). The second one is that node-originated traffic is the complete set of traffic that does not need masquerading.

I don't think this is about reachability. The more I think about it, I think it's about making sure that the return path from the backend pod hits the node where the DNAT happened so that the reverse DNAT can happen on the way back. Without this the end-to-end connection breaks. Using cluster pod-cidr to do this work today because we do the DNAT on every node boundary, and so the return path will hit that node. In case we don't do the DNAT on every node boundary, then we have to SNAT all traffic that did not originate from the node (even if part of the cluster pod-cidr) as otherwise the return path will not hit that node. I think that is what we are trying to reason through. And from that perspective, it seems to make more and more sense to write the rule as "locally generated from the node - no masquerad" vs "not locally generated from the node - so masquerade". Right?

@squeed based on this understanding, can we mark this as resolved?

@squeed I think your first point is wrong (that this assumes ll non-cluster-originated traffic cannot reach the cluster). That is certainly possible in GCP, for example, and I don't see how this proposal changes that.

I think your second point is correct (that this assumes that node-originated traffic is the complete set of traffic that does not need masquerading). Can you expand on that?

@thockin (sorry, k/enhancements mentions got blackholed). Argh, I forgot about NAT + return path. Ignore the noise, sorry.

satyasm · 2019-12-09T19:49:09Z

updated implementation notes to the new mode flags.

keps/sig-network/20191104-iptables-no-cluster-cidr.md

satyasm

Resolved code links to stable tags.

thockin

LGTM. Some comments on flags which can be in followup discussion.

Thanks!

/lgtm
/approve

thockin · 2019-12-13T23:20:42Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+some notion of node local pod traffic.
+
+The core logic in these cases is “how to determine” cluster originated traffic from non-cluster originated ones. 
+The proposal is that tracking pod traffic generated from within the node is sufficient to determine cluster originated 


@squeed I think your first point is wrong (that this assumes ll non-cluster-originated traffic cannot reach the cluster). That is certainly possible in GCP, for example, and I don't see how this proposal changes that.

I think your second point is correct (that this assumes that node-originated traffic is the complete set of traffic that does not need masquerading). Can you expand on that?

keps/sig-network/20191104-iptables-no-cluster-cidr.md

thockin · 2019-12-13T23:31:30Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+
+  the mode to use for detection local traffic. The default is cluster-cidr (current behavior)
+
+--cluster-cidr="cidr[,cidr,..]"


I'd like to propose that we shelve this change until we know we need it. Rather than make this more robust, let's encourage people to not use it. If people scream and shout, we can add it back.

That said, if it turns out to be trivial to implement and test, OK.

thockin · 2019-12-13T23:32:19Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+  than one can be specified if necessary. kube-proxy considers traffic as local if source is one
+  of the CIDR values. This is only used if `--detect-local=cluster-cidr` .
+
+--node-cidr[="cidr[,cidr,..]"]


Do we really need this flag?

thockin · 2019-12-13T23:35:45Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+  kube-proxy considers traffic as local if originating from an interface which matches one of given
+  prefixes. string argument is a comma separated list of interface prefix names, without the ending '+'.
+  This is only used if `--detect-local=pod-interface-prefix` or `--detect-local=bridge`. In the case of
+  latter, the prefix is used as option to `--physdev-in name` match instead of just `--physdev-in` in


Do we really want a prefix? I think this would be cleaner as a flag of its own.

--pod-bridge-name

Then, like above, we can document it as a trailing '+' meaning prefix.

thockin · 2019-12-13T23:35:58Z

keps/sig-network/20191104-iptables-no-cluster-cidr.md

+  of the CIDR values. If value is not specified, or flag is omitted,  defaults to node.podCIDR property on the node.
+  This is only used if `--detect-local=node-cidr` .
+
+--pod-interface-prefix="prefix[,prefix,..]"


Do we really need multiple prefixes?

What if we called this --pod-interface-name and documented the trailing '+' as a prefix?

keps/sig-network/20191104-iptables-no-cluster-cidr.md

k8s-ci-robot · 2019-12-13T23:40:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: satyasm, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-network/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

satyasm added 6 commits November 4, 2019 11:39

Remove knowledge of cluster CIDR in iptables.

65ac174

Initial KEP to enhance iptables rules so that we don't depend on cluster CIDR as part of the rules, allowing for implementations to have more flexibility on how they manage POD IPs.

Reformat code (remove initial indentation)

e1165ca

Simplified to node.podCIDR, added ipvs reference

ae147d6

Reify section names.

dc6deb9

updated to include interface conventions.

c599ca5

add reviewers and minor edits

b0ce8d7

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 4, 2019

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 4, 2019

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 4, 2019

k8s-ci-robot requested review from caseydavenport and dcbw November 4, 2019 19:42

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. labels Nov 4, 2019

aojea reviewed Nov 9, 2019

View reviewed changes

keps/sig-network/20191104-iptables-no-cluster-cidr.md Show resolved Hide resolved

aojea reviewed Nov 9, 2019

View reviewed changes

keps/sig-network/20191104-iptables-no-cluster-cidr.md Outdated Show resolved Hide resolved

bewing mentioned this pull request Nov 14, 2019

Support externalTrafficPolicy on ClusterIP kubernetes/kubernetes#85306

Closed

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 14, 2019

thockin changed the title ~~Iptables no cluster cidr~~ KEP: iptables detect "local" cluster cidr Nov 15, 2019

thockin changed the title ~~KEP: iptables detect "local" cluster cidr~~ KEP: kube-proxy detect "local" traffic w/o cluster CIDR Nov 15, 2019

caseydavenport reviewed Nov 15, 2019

View reviewed changes

bowei reviewed Nov 18, 2019

View reviewed changes

keps/sig-network/20191104-iptables-no-cluster-cidr.md Show resolved Hide resolved

danwinship reviewed Nov 18, 2019

View reviewed changes

squeed reviewed Nov 22, 2019

View reviewed changes

keps/sig-network/20191104-iptables-no-cluster-cidr.md Show resolved Hide resolved

Add implementation details.

6fc814b

Includes clarifications from comment discussions.

update toc

00ece50

thockin reviewed Dec 5, 2019

View reviewed changes

squeed reviewed Dec 5, 2019

View reviewed changes

thockin mentioned this pull request Dec 5, 2019

add DisableBindOptimization to LoadBalancerIngress kubernetes/kubernetes#85956

Closed

updated implementation to use mode and value flags

eef908c

satyasm added 2 commits December 9, 2019 12:17

added explicit note for hostNetwork

0dd2e49

support bridge interface naming

bbfc8c9

freehan reviewed Dec 12, 2019

View reviewed changes

keps/sig-network/20191104-iptables-no-cluster-cidr.md Outdated Show resolved Hide resolved

Resolve code links to stable tags

8c032f6

satyasm commented Dec 13, 2019

View reviewed changes

thockin reviewed Dec 13, 2019

View reviewed changes

k8s-ci-robot assigned thockin Dec 13, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2019

k8s-ci-robot merged commit 1b2b1e9 into kubernetes:master Dec 13, 2019

k8s-ci-robot added this to the v1.18 milestone Dec 13, 2019

caseydavenport mentioned this pull request Mar 19, 2020

Consider ClusterIP traffic routed to a node (from outside the cluster) as "external" for local-only kubernetes/kubernetes#79866

Closed

bharath-b23 mentioned this pull request Jun 15, 2020

Kube-proxy hides real IP kubernetes/kubernetes#10921

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP: kube-proxy detect "local" traffic w/o cluster CIDR #1354

KEP: kube-proxy detect "local" traffic w/o cluster CIDR #1354

satyasm commented Nov 4, 2019

k8s-ci-robot commented Nov 4, 2019

k8s-ci-robot commented Nov 4, 2019

satyasm commented Nov 4, 2019

satyasm commented Nov 6, 2019

MrHohn commented Nov 14, 2019

caseydavenport left a comment

satyasm commented Nov 27, 2019

satyasm commented Nov 28, 2019

thockin left a comment

thockin Dec 5, 2019

satyasm Dec 5, 2019 •

edited

Loading

squeed Dec 5, 2019

satyasm Dec 5, 2019

satyasm Dec 9, 2019

squeed Dec 5, 2019

satyasm Dec 5, 2019

satyasm Dec 9, 2019

thockin Dec 13, 2019

squeed Dec 16, 2019

satyasm commented Dec 9, 2019

satyasm left a comment

thockin left a comment

thockin Dec 13, 2019

thockin Dec 13, 2019

thockin Dec 13, 2019

thockin Dec 13, 2019

thockin Dec 13, 2019

thockin Dec 13, 2019

k8s-ci-robot commented Dec 13, 2019


		the mode to use for detection local traffic. The default is cluster-cidr (current behavior)

		--cluster-cidr="cidr[,cidr,..]"

KEP: kube-proxy detect "local" traffic w/o cluster CIDR #1354

KEP: kube-proxy detect "local" traffic w/o cluster CIDR #1354

Conversation

satyasm commented Nov 4, 2019

k8s-ci-robot commented Nov 4, 2019

k8s-ci-robot commented Nov 4, 2019

satyasm commented Nov 4, 2019

satyasm commented Nov 6, 2019

MrHohn commented Nov 14, 2019

caseydavenport left a comment

Choose a reason for hiding this comment

satyasm commented Nov 27, 2019

satyasm commented Nov 28, 2019

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

satyasm Dec 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

satyasm commented Dec 9, 2019

satyasm left a comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 13, 2019

satyasm Dec 5, 2019 •

edited

Loading