Add IPv4/IPv6 dual stack KEP #2254

leblancd · 2018-06-12T16:49:20Z

Add a KEP for IPv4/IPv6 dual stack functionality to Kubernetes clusters. Dual-stack functionality includes the following concepts:

Awareness of multiple IPv4/IPv6 address assignments per pod
Native IPv4-to-IPv4 in parallel with IPv6-to-IPv6 communications to, from, and within a cluster

References:
kubernetes/kubernetes issue # 62822
kubernetes/features issue # 563

feiskyer · 2018-06-12T17:16:07Z

/cc @kubernetes/sig-network-proposals

rpothier · 2018-06-12T18:55:11Z

/area ipv6

caseydavenport

@leblancd thanks for making this! I've done a first pass and added some comments.

caseydavenport · 2018-06-14T21:20:29Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+- Link Local Addresses (LLAs) on a pod will remain implicit (Kubernetes will not display nor track these addresses).
+- Kubernetes needs to be configurable for up to two service CIDRs.
+- Backend pods for a service can be dual stack. For the first release of dual-stack support, each IPv4/IPv6 address of a backend pod will be treated as a separate Kubernetes endpoint.
+- Kube-proxy needs to support IPv4 and IPv6 services in parallel (e.g. drive iptables and ip6tables in parallel).


We should also consider the impact to the IPVS proxy. I think at this point we need to maintain both (unless we say dual-stack is iptables only?)

Yes, good point. I had added an IPVS section below, but forgot to add it to the proposal summary here.

FYI, did some exploring into IPVS, and have found one problem using it so far. The team reports that, it does not currently support IPv6. Seems like we need to document that effort is needed there too.

Thanks for looking into this, it's good to know up front!

caseydavenport · 2018-06-14T21:28:46Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+### Awareness of Multiple IPs per Pod
+
+Since Kubernetes Version 1.9, Kubernetes users have had the capability to use dual-stack-capable CNI network plugins (e.g. Bridge + Host Local, Calico, etc.), using the 
+[0.3.1 version of the CNI Networking Plugin API](https://github.com/containernetworking/cni/blob/spec-v0.3.1/SPEC.md), to configure multiple IPv4/IPv6 addresses on pods. However, Kubernetes currently captures and uses only the first IP address in the list of assigned pod IPs that a CNI plugin returns to Kubelet in the [CNI Results structure](https://github.com/containernetworking/cni/blob/spec-v0.3.1/SPEC.md#result).


I believe Kubernetes uses only the IP address it reads from eth0 within the Pod and ignores the response from CNI altogether, right? Or did that get changed and I missed it? :)

That is still currently the case.. for now. CNI get is getting closer!

Thanks for the clarification!
@squeed - If we do add the capability for the CNI plugin to pass up metadata (labels and such) to Kubernetes, does this have to be done via a CNI "get", or is there a way for kubelet to gather this information directly from the CNI results?

CNI doesn't allow for any kind of annotations (right now -- that could change) - it only has IPs and routes. Changing that is... out of scope :-)

I think this section is fine as-is; how kubelet gets the list of IPs from a running container is just an implementation detail.

@squeed - Understood, thanks. This was more out of curiosity about where we're headed with the CNI API.

caseydavenport · 2018-06-14T21:39:27Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+### Support of Health/Liveness/Readiness Probes
+
+Currently, health, liveness, and readiness probes are defined without any concern for IP addresses or families. For the first release of dual-stack support, no configuration "knobs" will be added for probe definitions. A probe for a dual-stack pod will be deemed successful if either an IPv4 or IPv6 response is received. (QUESTION: Does the current probe implementation include DNS lookups, or are IP addresses hard coded?)


Does the current probe implementation include DNS lookups, or are IP addresses hard coded

I believe you can configure a host for the probe which will be resolved via DNS if it is a DNS name rather than an exact IP. See here

caseydavenport · 2018-06-14T21:42:18Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+### Load Balancer Operation ???
+
+### Network Policy Considerations ???


Speaking strictly for Calico, I don't believe there are any (besides testing).

I don't think there are API impacts because the NP API selects using labels rather than addresses (though now that I think about it we should check if the NP CIDR support does validation on IPv4 / IPv6)

Cool, thanks!

danwinship · 2018-06-15T14:27:57Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+#### First Release of Dual Stack: No Service Address Configuration "Knobs"
+For the first release of Kubernetes dual-stack support, no new configuration "knobs" will be added for service definitions. This greatly simplifies the design and implementation, but requires imposing the following behavior:
+- Service IP allocation: Kubernetes will always allocate a service IP from each service CIDR for each service that is created. (In the future, we might want to consider adding configuration options to allow a user to select e.g. whether a given service should be assigned only IPv4, only IPv6, or both IPv4 and IPv6 service IPs.)


If kube-proxy/plugins/etc are going to have to be updated to deal with multiple CIDR ranges anyway, then it would be useful to let the admin say "a.b.c.d/x should be treated as part of the service IP range, but the controller shouldn't allocate any new IPs out of that range". This would let you live-migrate the cluster from one service CIDR to another (something we (OpenShift) have had a handful of requests for).

(The same applies to the cluster CIDRs, though in that case the allocations are done by the plugin, not kube itself, so you could already implement this via plugin-specific configuration.)

Nit: the In the future clause in parens could be moved down, as a note, since it is not part of the imposed behavior for not using a knob).

I have heard a VERY small number of similar requests, but I think it's orthogonal to multi-family. We could add multi-CIDR with metadata (e.g. allow existing, but no more allocations) without touching multi-family support.

That said, if we do add multiple CIDRs, we should at least make it possible to add metadata later (so []struct, not []string).

I agree that we should add metadata within a PodIP structure, as @thockin described earlier, at least as a placeholder for the future. Support for live migration of pod CIDRs could then be done as a followup enhancement and make use of the metadata.

squeed · 2018-06-15T16:49:13Z

A few thoughts about ExtraPodIPs:

Should it always also include the default PodIP? So up-to-date clients don't have to manually concatenate
I would like some metadata attached to IPs. Some labels, perhaps?

Some time ago, before multi-network was tabled, I suggested that every IP should be labeled with the name of the network that created it. Then services could have an optional label selector.

Even if we don't use this right now, it would be good to have for room to grow.

squeed · 2018-06-15T16:56:08Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+The singular --service-cluster-ip-range argument will become deprecated.
+
+#### controller-manager Startup Configuration for Multiple Service CIDRs
+A new, plural "service-cluster-ip-ranges" option for the [controller-manager startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/) is proposed, in addition to retaining the existing, singular "service-cluster-ip-range" option (for backwards compaibility):


complete utter bikeshed: what about making --service-cluster-ip-range accept a comma-separated list?

if we can do that transparently, that might be great for all of these flags (if a bit less obviously named)

That sounds good to me! I wasn't sure if there'd be a problem with not having a trailing 's' for a plural argument, or any backwards compatibility headaches with command line arguments (e.g. old/new manifests working with new/old images).
I'll make this change for this and for the other command line arguments in this doc.

squeed · 2018-06-15T16:56:26Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+A new, plural "service-cluster-ip-ranges" option for the [controller-manager startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/) is proposed, in addition to retaining the existing, singular "service-cluster-ip-range" option (for backwards compaibility):
+```
+  --service-cluster-ip-range  ipNet         [Singular IP CIDR,  Default: 10.0.0.0/24]
+  --service-cluster-ip-ranges stringSlice   [Multiple IP CIDRs, comma separated list of CIDRs, Default: []]


what happens if the user specifies two v4 ranges?

good call - err on the side of over-specifying :)

sb1975 · 2018-10-12T19:33:51Z

@leblancd - Thanks for the details, do you have any link to understand the steps involved in creating the multiple ingress- for IPv4 and IPv6 seperately and also how to configure the NAT46 in the IPv4 ingress.

leblancd · 2018-10-22T15:52:33Z

@sb1975 - With the help of @aojea, we've put together an overview on how to install a dual-stack NGINX ingress controller on an (internally) IPv6-only cluster: "Installing a Dual-Stack Ingress Controller on an IPv6-Only Kubernetes Cluster". This requires that the nodes be configured with dual-stack public/global IPv4/IPv6 addresses, and it runs the ingress controller pods on the host network of each node.

I haven't configured Stateless NAT46 on a Kubernetes IPv6-only cluster, but you can find some good background references on the web. e.g. Citrux has a helpful reference for configuring their NAT46 appliance here, and there's a video on configuring Stateless NAT46 on a Cisco ASA here.

thockin · 2018-10-22T23:58:02Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+- Link Local Addresses (LLAs) on a pod will remain implicit (Kubernetes will not display nor track these addresses).
+- For simplicity, only a single family of service IPs per cluster will be supported (i.e. service IPs are either all IPv4 or all IPv6).
+- Backend pods for a service can be dual stack.
+- Endpoints for a dual-stack backend pod will be represented as a dual-stack address pair (i.e. 1 IPv4/IPv6 endpoint per backend pod, rather than 2 single-family endpoints per backend pod)


On this and the previous: If we say that Services are always single-family and you said that "Cross-family connectivity" is a non-goal, what value do we get for endpoints to be dual-stack?

I guess I could see an argument for headless Services or external Services. Is that what motivates this? Is it worth the effort? Could it be deferred?

Or is this about NodePorts being available on both families?

This is a very good question, and your point is well-taken that we probably don't get value out of having endpoints being dual-stack. Maybe you can confirm my thought process here. I had added this dual-stack endpoints with the thinking that maybe, somehow, ingress controllers or load balancers might need to know about V4 and V6 addresses for endpoints, in order to provide dual-stack access from outside. Thinking about this more, I don't think this is the case. For ingress controllers and load balancers to provide dual-stack access, support of dual-stack NodePorts and dual-stack externalIPs (and ingress controllers using hostnetwork pods) should be sufficient.

Let me know what you think, so I can modify the spec.

For headless services, I believe that we can get by with a single IP family. The IP assigned for a headless service will match the "primary" IP family. This would put headless services on par with non-headless Kube services.

Re. the "Cross-family connectivity", I should remove this from the non-goals. It's confusing and misleading. Family cross over will be supported e.g. with dual-stack ingress controller mapping to a single family endpoint inside the cluster. Cross-family connectivity won't be supported inside the cluster, but that's pretty obvious.

I think the non-goal is correct -- Kubernetes itself is NOT doing address family translation. The fact that Ingress controllers can do that is merely a side-effect of the fact that they are almost universally full proxies which accept the frontside connection and open another for the backside. I don't think it is obvious that we won't convert a v4 to a v6 connection via service VIP, and we should be clear that it is NOT part of the scope.

When I wrote this question I was thinking about Service VIP -> Backends. That has to be same-family (because of the non-goal above), so alt-family endpoints is nonsensical. I think this is also true for external IPs and load-balancer IPs -- they are received and processed the same as cluster IPs, with no family conversion.

BUT...

external IPs is a list, so could plausibly include both families.

LB status.ingress is a list, so could plausibly include both families.

nodePorts could reasonably be expected to work on any interface address.

headless services could reasonably be expected to work on any interface address.

You suggest you might be willing to step back from (4), but unless we also step back from (1), (2), AND (3), that wouldn't save any work.

I think the only simplification here is if we can say "all service-related logic is single family". And I am not sure that is very useful -- tell me if you disagree.

Assuming we have ANY service-related functionality with both families, we need dualstack endpoints :(

Or am I missing something?

...more to the point, is multi-family endpoints, nodeports, LBs, etc somethign we can defer to a "phase 2" and iterate? Would it simplify this proposal or just make it not useful?

Your analysis is spot on. I agree, we would need dual-stack endpoints to support (1), (2), (3), and (4) (although I'm not real familiar with how LB status.ingress workings, but it sounds like it's also driven by endpoint events/state), and if we support 1 of the 4 we might as well support all 4.

And regarding the idea of NOT supporting 1-4 as a simplification, I believe that would make this proposal not very useful. What we'd have left is informational visibility to dual stack pod addresses, as far as I can tell.

I'd say that the minimum useful subset of support would have to include dual-stack endpoints, nodeports, LBs, externalIPs, and headless services, IMO.

OK, so any scope-reduction around not doing dual-stack endpoints is rendered moot,a nd all such comments should be ignored :)

thockin · 2018-10-22T23:59:22Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+  - NodePort: Support listening on both IPv4 and IPv6 addresses
+  - ExternalIPs: Can be IPv4 or IPv6
+- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
+- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods.


If we have a "primary" family (e.g. the one used for Services) do we need this flag?

Do we need a per-pod, per-probe flag to request address family?

I think we can do without a global kubelet configuration for preferred IP family for probes. I'll change this to say that health/liveness/readiness probes will use the IP family of the default IP for a pod (which should match the primary IP family in most cases).

I don't think we need a per-pod, per-probe flag for IP family for the intial release. In a future release, we can consider adding a per-pod, per-probe flag to allow e.g. a user to specify that probes can be dual stack, meaning probes are sent for both IP families, and success is declared if either probe is successful, or alternatively if both probes are successful.

thockin · 2018-10-23T15:57:38Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+    //    Properties: Arbitrary metadata associated with the allocated IP.
+    type PodIPInfo struct {
+        IP string
+        Properties map[string]string


Unless we have examples of what we want to put in properties and we are willing to spec, validate, and test things like key formats, content size, etc, we should probably leave this out for now. Just a comment indicating this is left as a followup patch-set, perhaps?

@thockin - Sure, I can take this out. If the Properties map is removed, should the PodIPInfo structure be removed, and just leave PodIPs as a simple slice of strings, to simplify?

The properties map would be very useful for multi-network. And no sense in changing the data structure twice? I'd prefer to keep it if possible.

No matter what, I would keep the struct.

@squeed if we keep it, we need to have concrete use-cases for it such that we can flesh out the management of that data as I listed above. We can always ADD fields, with new validation, etc. I'd rather add it when we have real need. I am confident it's something we will want, eventually.

I see this has already generated some discussion :)
I'll add my own comments here - @leblancd you can ignore my other comment up above.

One pattern I've seen used successfully in internal interfaces is to have a mostly-strongly typed struct with a bag-of-strings at the end for "experimental" free-for-all properties. But that requires agreement that the bag-of-strings should not be relied upon, and can change at any time. I do not think we can enforce such a thing if we put this into an external facing API, so my vote is to only add fully-typed fields with validation and strong semantic meaning. Then we can argue about names all at once before they are used instead of after the fact :)

thockin · 2018-10-23T15:58:23Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+        Properties map[string]string
+    }
+
+    // IP addresses allocated to the pod with associated metadata. This list


We will also want to document the sync logic. I finally sent a PR against docs.

#2838

By "document the sync logic", you mean just adding references to that doc in this spec, right?

I'd spell it out in the comments. I don't expect end-users to read our API devel docs :)

thockin · 2018-10-23T16:00:40Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+```
+
+##### Default Pod IP Selection
+Older servers and clients that were built before the introduction of full dual stack will only be aware of and make use of the original, singular PodIP field above. It is therefore considered to be the default IP address for the pod. When the PodIP and PodIPs fields are populated, the PodIPs[0] field must match the (default) PodIP entry. If a pod has both IPv4 and IPv6 addresses allocated, then the IP address chosen as the default IP address will match the IP family of the cluster's configured service CIDR. For example, if the service CIDR is IPv4, then the IPv4 address will be used as the default address.


"When the PodIP and PodIPs fields are populated" implies no sync logic. I think we all settled on sync being a better path?

By "sync logic" you mean how the singular value from old clients (and plural value from new clients) gets fixed up (as described in the "On Compatibility" section?

I'll delete that line. What I meant to say is covered in your API change guide update.

thockin · 2018-10-23T19:47:55Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+- Because service IPs will remain single-family, pods will continue to access the CoreDNS server via a single service IP. In other words, the nameserver entries in a pod's /etc/resolv.conf will typically be a single IPv4 or single IPv6 address, depending upon the IP family of the cluster's service CIDR.
+- Non-headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending upon the IP family of the cluster's service CIDR.
+- Headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on the service's endpointFamily configuration (see [Configuration of Endpoint IP Family in Service Definitions](#configuration-of-endpoint-ip-family-in-service-definitions)).


Depends on previous question about this config being per-service vs per-pod

I now think single-family headless services would work (on par with non-headless kubernetes services being single-family).

thockin · 2018-10-23T19:48:52Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+The [Kubernetes ingress feature](https://kubernetes.io/docs/concepts/services-networking/ingress/) relies on the use of an ingress controller. The two "reference" ingress controllers that are considered here are the [GCE ingress controller](https://github.com/kubernetes/ingress-gce/blob/master/README.md#glbc) and the [NGINX ingress controller](https://github.com/kubernetes/ingress-nginx/blob/master/README.md#nginx-ingress-controller).
+
+#### GCE Ingress Controller: Out-of-Scope, Testing Deferred For Now
+It is not clear whether the [GCE ingress controller](https://github.com/kubernetes/ingress-gce/blob/master/README.md#glbc) supports external, dual-stack access. Testing of dual-stack access to Kubernetes services via a GCE ingress controller is considered out-of-scope until after the initial implementation of dual-stack support for Kubernetes.


I'd say this is Google's problem to implement.

@bowei

I'll just say this is out-of-scope for this effort.

The unclear parts should at least be clarified :)
I can take this one.

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

thockin · 2018-10-23T19:52:04Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+#### Multiple bind addresses configuration
+
+The existing "--bind-address" option for the will be modified to support multiple IP addresses in a comma-separated list (rather than a single IP string).


Why is this a sub-heading of cloud-providers?

There are other components that support a flag like this - do we have a list?

@rpothier ?

Kube-proxy and the kubelet startup config have a similar requirement, that's a good idea to list them.
Also possibly the controller manager if we went with the full Dual Stack.

Also, that line is missing the link to the cloud controller manager, so it should read
The existing "--bind-address" option for the cloud-controller-manager will be modified ...

thockin · 2018-10-23T19:55:15Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+  - name: MY_POD_IPS
+    valueFrom:
+      fieldRef:
+        fieldPath: status.podIPs


@kubernetes/sig-cli-api-reviews We should get a consult as to whether this is right or whether the fieldpath should be something like status.podIPs[].ip - it was supposed to be a literal syntax.

leblancd

@thockin - Thank you for your thorough review! I think that eliminating the support for dual-stack endpoints makes sense, let me know if I should go ahead and remove this.

leblancd · 2018-10-27T20:22:56Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+- Link Local Addresses (LLAs) on a pod will remain implicit (Kubernetes will not display nor track these addresses).
+- For simplicity, only a single family of service IPs per cluster will be supported (i.e. service IPs are either all IPv4 or all IPv6).
+- Backend pods for a service can be dual stack.
+- Endpoints for a dual-stack backend pod will be represented as a dual-stack address pair (i.e. 1 IPv4/IPv6 endpoint per backend pod, rather than 2 single-family endpoints per backend pod)


This is a very good question, and your point is well-taken that we probably don't get value out of having endpoints being dual-stack. Maybe you can confirm my thought process here. I had added this dual-stack endpoints with the thinking that maybe, somehow, ingress controllers or load balancers might need to know about V4 and V6 addresses for endpoints, in order to provide dual-stack access from outside. Thinking about this more, I don't think this is the case. For ingress controllers and load balancers to provide dual-stack access, support of dual-stack NodePorts and dual-stack externalIPs (and ingress controllers using hostnetwork pods) should be sufficient.

Let me know what you think, so I can modify the spec.

For headless services, I believe that we can get by with a single IP family. The IP assigned for a headless service will match the "primary" IP family. This would put headless services on par with non-headless Kube services.

Re. the "Cross-family connectivity", I should remove this from the non-goals. It's confusing and misleading. Family cross over will be supported e.g. with dual-stack ingress controller mapping to a single family endpoint inside the cluster. Cross-family connectivity won't be supported inside the cluster, but that's pretty obvious.

leblancd · 2018-10-27T20:41:53Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+  - NodePort: Support listening on both IPv4 and IPv6 addresses
+  - ExternalIPs: Can be IPv4 or IPv6
+- Kube-proxy IPVS mode will support dual-stack functionality similar to kube-proxy iptables mode as described above. IPVS kube-router support for dual stack, on the other hand, is considered outside of the scope of this proposal.
+- For health/liveness/readiness probe support, a kubelet configuration will be added to allow a cluster administrator to select a preferred IP family to use for implementing probes on dual-stack pods.


I think we can do without a global kubelet configuration for preferred IP family for probes. I'll change this to say that health/liveness/readiness probes will use the IP family of the default IP for a pod (which should match the primary IP family in most cases).

I don't think we need a per-pod, per-probe flag for IP family for the intial release. In a future release, we can consider adding a per-pod, per-probe flag to allow e.g. a user to specify that probes can be dual stack, meaning probes are sent for both IP families, and success is declared if either probe is successful, or alternatively if both probes are successful.

leblancd · 2018-10-28T16:24:08Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+    //    Properties: Arbitrary metadata associated with the allocated IP.
+    type PodIPInfo struct {
+        IP string
+        Properties map[string]string


@thockin - Sure, I can take this out. If the Properties map is removed, should the PodIPInfo structure be removed, and just leave PodIPs as a simple slice of strings, to simplify?

leblancd · 2018-10-28T17:48:21Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+```
+
+##### Default Pod IP Selection
+Older servers and clients that were built before the introduction of full dual stack will only be aware of and make use of the original, singular PodIP field above. It is therefore considered to be the default IP address for the pod. When the PodIP and PodIPs fields are populated, the PodIPs[0] field must match the (default) PodIP entry. If a pod has both IPv4 and IPv6 addresses allocated, then the IP address chosen as the default IP address will match the IP family of the cluster's configured service CIDR. For example, if the service CIDR is IPv4, then the IPv4 address will be used as the default address.


By "sync logic" you mean how the singular value from old clients (and plural value from new clients) gets fixed up (as described in the "On Compatibility" section?

I'll delete that line. What I meant to say is covered in your API change guide update.

leblancd · 2018-10-28T17:50:09Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+        Properties map[string]string
+    }
+
+    // IP addresses allocated to the pod with associated metadata. This list


By "document the sync logic", you mean just adding references to that doc in this spec, right?

leblancd · 2018-10-28T19:25:30Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+Currently, health, liveness, and readiness probes are defined without any concern for IP addresses or families. For the first release of dual-stack support, a cluster administrator will be able to select the preferred IP family to use for probes when a pod has both IPv4 and IPv6 addresses. For this selection, a new "--preferred-probe-ip-family" argument for the for the [kubelet startup configuration](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) will be added:
+```
+  --preferred-probe-ip-family  string   ["ipv4", "ipv6", or "none". Default: "none", meaning use the pod's default IP]


See my response to your earlier comment. I think we can do without this configuration, and for probes Kubelet should use the family of the default IP for each pod. I don't think we need a per-pod or per-probe configuration in the initial release of dual-stack, maybe do this as a followup (including support of probes that work on both IP families, requiring either both V4 and V6 responses, or either V4 or V6 responses).

leblancd · 2018-10-28T19:26:22Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+- Because service IPs will remain single-family, pods will continue to access the CoreDNS server via a single service IP. In other words, the nameserver entries in a pod's /etc/resolv.conf will typically be a single IPv4 or single IPv6 address, depending upon the IP family of the cluster's service CIDR.
+- Non-headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record) or an IPv6 entry (AAAA record), depending upon the IP family of the cluster's service CIDR.
+- Headless Kubernetes services: CoreDNS will resolve these services to either an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on the service's endpointFamily configuration (see [Configuration of Endpoint IP Family in Service Definitions](#configuration-of-endpoint-ip-family-in-service-definitions)).


I now think single-family headless services would work (on par with non-headless kubernetes services being single-family).

leblancd · 2018-10-28T19:26:47Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+The [Kubernetes ingress feature](https://kubernetes.io/docs/concepts/services-networking/ingress/) relies on the use of an ingress controller. The two "reference" ingress controllers that are considered here are the [GCE ingress controller](https://github.com/kubernetes/ingress-gce/blob/master/README.md#glbc) and the [NGINX ingress controller](https://github.com/kubernetes/ingress-nginx/blob/master/README.md#nginx-ingress-controller).
+
+#### GCE Ingress Controller: Out-of-Scope, Testing Deferred For Now
+It is not clear whether the [GCE ingress controller](https://github.com/kubernetes/ingress-gce/blob/master/README.md#glbc) supports external, dual-stack access. Testing of dual-stack access to Kubernetes services via a GCE ingress controller is considered out-of-scope until after the initial implementation of dual-stack support for Kubernetes.


I'll just say this is out-of-scope for this effort.

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

leblancd · 2018-10-28T19:39:25Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+
+#### Multiple bind addresses configuration
+
+The existing "--bind-address" option for the will be modified to support multiple IP addresses in a comma-separated list (rather than a single IP string).


@rpothier ?

squeed · 2018-10-29T12:43:15Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+This feature requires the use of the [CNI Networking Plugin API version 0.3.1](https://github.com/containernetworking/cni/blob/spec-v0.3.1/SPEC.md)
+or later. The dual-stack feature requires no changes to this API.
+
+The versions of CNI plugin binaries that must be used for proper dual-stack functionality (and IPv6 functionality in general) depend upon the version of Docker that is used in the cluster nodes (see [CNI issue #531](https://github.com/containernetworking/cni/issues/531) and [CNI plugins PR #113](https://github.com/containernetworking/plugins/pull/113)):


Issue 531 was a weird docker interaction that we've fixed; CNI no longer has a Docker version dependency.

@squeed (and @nyren) thanks for taking care of this! I think it's fair to say that CNI 0.7.0 (or newer) no longer has the Docker dependency, i.e. if you're using CNI 0.6.0, you'll still have the dependency? Kubernetes has a bunch of distro pointers that point to CNI 0.6.0 for plugin binaries, so those pointers should be bumped up in the near future.

sb1975 · 2018-10-29T21:50:59Z

@sb1975 - With the help of @aojea, we've put together an overview on how to install a dual-stack NGINX ingress controller on an (internally) IPv6-only cluster: "Installing a Dual-Stack Ingress Controller on an IPv6-Only Kubernetes Cluster". This requires that the nodes be configured with dual-stack public/global IPv4/IPv6 addresses, and it runs the ingress controller pods on the host network of each node.

I haven't configured Stateless NAT46 on a Kubernetes IPv6-only cluster, but you can find some good background references on the web. e.g. Citrux has a helpful reference for configuring their NAT46 appliance here, and there's a video on configuring Stateless NAT46 on a Cisco ASA here.

@leblancd : Thanks for the response , this is very helpful but we have a additional use case : we need a IPv4 client reach a Kubernetes IPv6 service over non-http traffic(like SNMP). Now I understand the ingress would only support only http rules so how do we enable this please ?

rpothier · 2018-10-30T16:23:19Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+The kubeadm configuration options for advertiseAddress and podSubnet will need to be changed to handle a comma-separated list of CIDRs:
+```
+    api:
+      advertiseAddress: "fd00:90::2,10.90.0.2" [Multiple IP CIDRs, comma separated list of CIDRs]


Nit: the advertiseAddresses are addresses, not CIDRs

Yes indeedy.

aojea · 2018-10-30T21:49:43Z

@leblancd : Thanks for the response , this is very helpful but we have a additional use case : we need a IPv4 client reach a Kubernetes IPv6 service over non-http traffic(like SNMP). Now I understand the ingress would only support only http rules so how do we enable this please ?

@sb1975 the nginx ingress controller supports non http traffic over TCP and UDP, however, seems that feature is going to be removed kubernetes/ingress-nginx#3197

thockin · 2018-11-02T17:42:01Z

@leblancd Can you make your edits and resolve any comment threads that are old and stale? Ping me and I'll do another top-to-bottom pass. Hopefully we can merge soon and iterate finer points.

leblancd · 2018-11-02T18:54:35Z

@thockin - Will get the next editing pass in by Monday. I'm working on some V6 CI changes today. Thanks!

neolit123 · 2018-11-02T18:56:28Z

/assign @timothysc
@kubernetes/sig-cluster-lifecycle
for the kubeadm and kubespray related topics.

varunmar · 2018-11-05T10:18:00Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+## Motivation
+
+The adoption of IPv6 has increased in recent years, and customers are requesting IPv6 support in Kubernetes clusters. To this end, the support of IPv6-only clusters was added as an alpha feature in Kubernetes Version 1.9. Clusters can now be run in either IPv4-only, IPv6-only, or in a "single-pod-IP-aware" dual-stack configuration. This "single-pod-IP-aware" dual-stack support is limited by the following restrictions:
+- Some CNI network plugins are capable of assigning dual-stack addresses on a pod, but Kubernetes is aware of only one address per pod.


Thanks for the detailed design! We are running some prototype dual stack configurations inside of GCE and starting to find ways to work around the fact that Kubernetes itself is unaware of the IPv6 addresses.
I'd like very much to stay close to the design in this doc, and reinforce our prototyping/testing efforts when the time comes. Please keep me in the loop :)

varunmar · 2018-11-05T10:22:38Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+This proposal aims to extend the Kubernetes Pod Status API so that Kubernetes can track and make use of up to one IPv4 address and up to one IPv6 address assignment per pod.
+
+#### Versioned API Change: PodStatus v1 core
+In order to maintain backwards compatibility for the core V1 API, this proposal retains the existing (singular) "PodIP" field in the core V1 version of the [PodStatus V1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#podstatus-v1-core), and adds a new array of structures that store pod IPs along with associated metadata for that IP. The metadata for each IP (refer to the "Properties" map below) will not be used by the dual-stack feature, but is added as a placeholder for future enhancements, e.g. to allow CNI network plugins to indicate to which physical network that an IP is associated. Retaining the existing "PodIP" field for backwards compatibility is in accordance with the [Kubernetes API change quidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/api_changes.md).


I've grown really leery of a bag-of-strings properties approach - I think that unless we commit to a naming scheme, or at least a conflict resolution mechanism if two components start using the same keys in incompatible ways I would really like to see these develop as fully specified types rather than a loose bag of strings.
What do you think?

varunmar · 2018-11-05T10:30:05Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+    //    Properties: Arbitrary metadata associated with the allocated IP.
+    type PodIPInfo struct {
+        IP string
+        Properties map[string]string


I see this has already generated some discussion :)
I'll add my own comments here - @leblancd you can ignore my other comment up above.

One pattern I've seen used successfully in internal interfaces is to have a mostly-strongly typed struct with a bag-of-strings at the end for "experimental" free-for-all properties. But that requires agreement that the bag-of-strings should not be relied upon, and can change at any time. I do not think we can enforce such a thing if we put this into an external facing API, so my vote is to only add fully-typed fields with validation and strong semantic meaning. Then we can argue about names all at once before they are used instead of after the fact :)

varunmar · 2018-11-05T10:31:17Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+```
+  --pod-cidr  ipNetSlice   [IP CIDRs, comma separated list of CIDRs, Default: []]
+```
+Only the first address of each IP family will be used; all others will be ignored.


Also, I think you mean "first CIDR" here, not the first address.

varunmar · 2018-11-05T10:35:22Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+     IP string `json:"ip" protobuf:"bytes,1,opt,name=ip"`
+     // The IPs for this endpoint. The zeroth element (IPs[0] must match
+     // the default value set in the IP field)
+     IPs []string `json:"ips" protobuf:"bytes,5,opt,name=ips"`


If the pod IPs have metadata describing them (the PodIPs struct) then isnt' that useful to surface here as well?
It seems like @squeed has a use case for labeling IPs with the network names - is it useful to endpoints controllers (like ingress?) to know that here too?

If we do have the metadata here though, it will need to be exactly the same structure as the PodIPs. I'm not sure if that raises any other API issues.

varunmar · 2018-11-05T10:43:22Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+#### Configuration of Endpoint IP Family in Service Definitions
+This proposal adds an option to configure an endpoint IP family for a Kubernetes service:
+```
+    endpointFamily: <ipv4|ipv6|dual-stack>       [Default: dual-stack]


If service addresses only come from a single address family, why does this belong in the Service definition?

Or to put it another way - shouldn't the default be the same address family as the service CIDR? If Kubernetes itself isn't going to do any 6/4 translation, could you say more about how this can be used in any other way?

varunmar · 2018-11-05T10:47:29Z

keps/sig-network/0013-20180612-ipv4-ipv6-dual-stack.md

+The [Kubernetes ingress feature](https://kubernetes.io/docs/concepts/services-networking/ingress/) relies on the use of an ingress controller. The two "reference" ingress controllers that are considered here are the [GCE ingress controller](https://github.com/kubernetes/ingress-gce/blob/master/README.md#glbc) and the [NGINX ingress controller](https://github.com/kubernetes/ingress-nginx/blob/master/README.md#nginx-ingress-controller).
+
+#### GCE Ingress Controller: Out-of-Scope, Testing Deferred For Now
+It is not clear whether the [GCE ingress controller](https://github.com/kubernetes/ingress-gce/blob/master/README.md#glbc) supports external, dual-stack access. Testing of dual-stack access to Kubernetes services via a GCE ingress controller is considered out-of-scope until after the initial implementation of dual-stack support for Kubernetes.


The unclear parts should at least be clarified :)
I can take this one.

timothysc · 2018-11-05T20:50:08Z

@neolit123 This will affect sig-cluster-lifecycle, but it's squarely on sig-networking.
/assign @thockin

justaugustus · 2018-11-20T04:44:56Z

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

justaugustus · 2018-12-01T08:05:40Z

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

k8s-ci-robot · 2018-12-01T08:05:41Z

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2018-12-05T12:05:24Z

Moving #2254 to kubernetes/enhancements#648

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 12, 2018

k8s-ci-robot requested review from dcbw and thockin June 12, 2018 16:49

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jun 12, 2018

k8s-ci-robot added the kind/design Categorizes issue or PR as related to design. label Jun 12, 2018

k8s-ci-robot added the area/ipv6 label Jun 12, 2018

leblancd mentioned this pull request Jun 12, 2018

Reserving KEP # 13 for 'IPv4/IPv6 Dual Stack' specification. #2249

Closed

leblancd force-pushed the dual-stack-kep branch 3 times, most recently from 9cea8c5 to 85604c7 Compare June 14, 2018 14:43

leblancd mentioned this pull request Jun 14, 2018

Support multiple pod IP addresses kubernetes/kubernetes#27398

Closed

leblancd force-pushed the dual-stack-kep branch 8 times, most recently from 52f9e79 to e9b1254 Compare June 14, 2018 19:46

leblancd mentioned this pull request Jun 14, 2018

Support and/or exploit ipv6 kubernetes/kubernetes#1443

Closed

caseydavenport reviewed Jun 14, 2018

View reviewed changes

danwinship reviewed Jun 15, 2018

View reviewed changes

squeed reviewed Jun 15, 2018

View reviewed changes

Add IPv4/IPv6 dual stack KEP

fc4d40b

leblancd force-pushed the dual-stack-kep branch from 1216d70 to fc4d40b Compare October 18, 2018 19:50

thockin reviewed Oct 23, 2018

View reviewed changes

k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Oct 23, 2018

leblancd commented Oct 28, 2018

View reviewed changes

squeed reviewed Oct 29, 2018

View reviewed changes

rpothier reviewed Oct 30, 2018

View reviewed changes

rpothier mentioned this pull request Oct 30, 2018

Add IPNetSlice and unit tests spf13/pflag#170

Merged

k8s-ci-robot assigned timothysc Nov 2, 2018

neolit123 mentioned this pull request Nov 4, 2018

kubeadm init failing for IPv6-only configuration kubernetes/kubeadm#1212

Closed

varunmar reviewed Nov 5, 2018

View reviewed changes

k8s-ci-robot assigned thockin Nov 5, 2018

timothysc removed their assignment Nov 6, 2018

k8s-ci-robot closed this Dec 1, 2018

aojea mentioned this pull request Dec 5, 2018

Add IPv4/IPv6 dual stack KEP kubernetes/enhancements#648

Closed

pmichali mentioned this pull request Dec 7, 2018

DualStack: Add IPv4/IPv6 Dual-Stack support and awareness kubernetes/kubernetes#62822

Closed


		### Support of Health/Liveness/Readiness Probes

		Currently, health, liveness, and readiness probes are defined without any concern for IP addresses or families. For the first release of dual-stack support, no configuration "knobs" will be added for probe definitions. A probe for a dual-stack pod will be deemed successful if either an IPv4 or IPv6 response is received. (QUESTION: Does the current probe implementation include DNS lookups, or are IP addresses hard coded?)


		### Load Balancer Operation ???

		### Network Policy Considerations ???


		#### Multiple bind addresses configuration

		The existing "--bind-address" option for the will be modified to support multiple IP addresses in a comma-separated list (rather than a single IP string).

Add IPv4/IPv6 dual stack KEP #2254

Add IPv4/IPv6 dual stack KEP #2254

Conversation

leblancd commented Jun 12, 2018 • edited Loading

feiskyer commented Jun 12, 2018

rpothier commented Jun 12, 2018

caseydavenport left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squeed commented Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sb1975 commented Oct 12, 2018

leblancd commented Oct 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leblancd Oct 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leblancd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leblancd Oct 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leblancd Oct 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squeed Oct 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sb1975 commented Oct 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leblancd commented Jun 12, 2018 •

edited

Loading

squeed commented Jun 15, 2018 •

edited

Loading

leblancd commented Oct 22, 2018 •

edited

Loading

leblancd Oct 27, 2018 •

edited

Loading

leblancd Oct 27, 2018 •

edited

Loading

leblancd Oct 28, 2018 •

edited

Loading

squeed Oct 29, 2018 •

edited

Loading

aojea commented Oct 30, 2018 •

edited

Loading