Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a configuration knob to allow Pods to use different VPC subnets #119

Closed
liwenwu-amazon opened this issue Jun 27, 2018 · 4 comments
Closed

Comments

@liwenwu-amazon
Copy link
Contributor

Today, ipamD (design) uses primary ENI's subnets and security groups when allocating new ENIs. This means Pods running on the nodes are using same subnets and security groups as node's primary ENI.

Here are few use cases that requires Pods to used different VPC subnets than the subnet used by node's primary ENI:

  • There is a limited IP addresses available in the subnets used by node's primary ENI. This limits the number of Pods can be created for the cluster
@xdrus
Copy link

xdrus commented Jun 28, 2018

Our use case is multi-"zone" network where we run different types of workload in a specific subnets. Simplified example: publicly faced pods shouldn't be in a network with access to internal DB. Right now we have to run an ASG in every our zone. Having ability to attach ENIs from different subnets and use annotations to map pods to specific subnets can drastically increase density and node's utilization.
Moreover, from a security perspective, it would be great to separate POD's subnets from instance's network (do not use primary ENI for pods).

@liwenwu-amazon
Copy link
Contributor Author

@xdrus Thank you for sharing your use case. I have few questions:

  • When you mention drastically increase density and node's utilization, do you envision Pods of different "zone" are running on a single node?
    • if this is the case, do you worry about security isolation between Pods from different "zone"?
  • How do you make kube-scheduler schedule Pods onto the nodes where there is enough IP addresses for Pod's subnet?

@xdrus
Copy link

xdrus commented Jun 29, 2018

@liwenwu-amazon

  1. Yes, our security team is ok to run pods from different zones on one instance. Container isolation + host hardening + enforced security both on host and Kubernetes level make them happy :) Actually we do use this configuration in on-prem clusters with Calico.
  2. With calico we use ippools for different zones and annotations to assign pods to a specific pool. Then we use Calico's BGP to propagate our cluster's ip pools to corporate network and manage inter-zone access on corporate firewall + with Calico policies on cluster level.
    One of the possible approaches for AWS VPC CNI plugin might be using native tags to select subnets and annotation on pod level to map pods. lyft plugin uses tags to choose subnets but the don't allow to choose subnet for pods.
    So cni.conf can look like (for our use-case):
{
    "cniVersion": "0.3.1",
    "name": "amazon-vpc-cni-k8s",
    "plugins": [
	{
	    "cniVersion": "0.3.1",
	    "type": "amazon-vpc-cni-k8s-ipam",
	    "interfaceIndex": 1,
	    "subnetTags": {
		"zone": "external",
                "kubernetes.io/cluster/ClusterName": ""
	    },
	    "secGroupIds": [
		"sg-1234"
	    ]
	},
       {
	    "cniVersion": "0.3.1",
	    "type": "amazon-vpc-cni-k8s-ipam",
	    "interfaceIndex": 2,
	    "subnetTags": {
		"zone": "internal",
                "kubernetes.io/cluster/ClusterName": ""
	    },
	    "secGroupIds": [
		"sg-5678"
	    ]
	},
...
}

and pod spec:

annotations:
    "vpc-cni-k8s.amazonaws.com/subnet_selector": "zone in (internal)"

(e.g. using syntax for selectors).

That said I don't have enough knowledge how to solve scheduling issue. It is not a problem with calico as it is not constrained on number of IPs per node. I wish AWS supported up to 100 IP per ENI, then it is not an issues (as max number of pods per node is 100 anyway). Of course in this case we will need a way to control how many secondary IPs per interface are pre allocated.
Probably extended node's resources and using resource requests instead of annotations above can do the trick, but I never played with them yet.

@liwenwu-amazon
Copy link
Contributor Author

this is a duplicate of #131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants