Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

Cluster services/pods secondary subnets overlap causing service IP issues. #52

Closed
one1zero1one opened this issue Jul 22, 2019 · 8 comments · Fixed by #86
Closed

Cluster services/pods secondary subnets overlap causing service IP issues. #52

one1zero1one opened this issue Jul 22, 2019 · 8 comments · Fixed by #86
Labels
enhancement New feature or request

Comments

@one1zero1one
Copy link

Thanks for open-sourcing the modules,

We had couple issues with overlapping cluster services IPs / pods IPs (services IPs becoming unreachable) that we traced to having both services and pods share the same secondary range. (

services_secondary_range_name = var.cluster_secondary_range_name
)

As per https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips#secondary_ranges the ranges have to be different, either by letting GKE create them or specifying them.

We're currently looking into tweaking the module to allow gke to create secondary ranges such that we don't have to refactor the vpc module for multiple secondary ranges. Wondering if anyone else had to deal with this.

@one1zero1one
Copy link
Author

one1zero1one commented Jul 25, 2019

The way we addressed it is essentially by changing the ip_allocation_policy for the creation of the cluster, passing new variables to the module:

  node_ipv4_cidr_block     = cidrsubnet("10.0.0.0/16", 9, 509) # pick a "/25" max 124 nodes 
  cluster_ipv4_cidr_block  = "/17"                             # pick a "/17" max 32k pods 
  services_ipv4_cidr_block = "/22"                             # pick a "/22" max 1k services
  network                  = module.vpc.network

change for google_container_cluster:

  ip_allocation_policy {
    create_subnetwork = "true"
    subnetwork_name   = "${var.name}-subnetwork-k8s"

    node_ipv4_cidr_block     = var.node_ipv4_cidr_block
    cluster_ipv4_cidr_block  = var.cluster_ipv4_cidr_block
    services_ipv4_cidr_block = var.services_ipv4_cidr_block
  }

So the nodes are still at the edge of the public subnet sort of, while google creates the subnet

  • creates/allocates secondary ranges for the pods and services that won't overlap.

Obviously, this requires couple tweaks at the vpc module, such as nat for the new subnet. To play nice with this, we've updated the google vpc network module to allow custom subnets. So calling with new variables:

extra_subnet_nat = [{
    name = "${var.name}-devtools-subnetwork-k8s"
    },
  ]

and then in the main.tf of the vpc module would pick up the links

data "google_compute_subnetwork" "extra_nets" {
  count  = length(var.extra_subnet_nat)
  name   = "mgmt-devtools-subnetwork-k8s" #var.extra_subnet_nat[count.index].name
  region = var.region
}

and then feeding them to google_compute_router_nat

  subnetwork {
    name                    = google_compute_subnetwork.vpc_subnetwork_public.self_link
    source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
  }

  dynamic "subnetwork" {
    for_each = var.extra_subnet_nat
    iterator = extra_subnet_nat
    content {
      name                    = data.google_compute_subnetwork.extra_nets.*.self_link[extra_subnet_nat.key]
      source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
    }
  }

Now I'm going to close this issue, because technically using the public secondary range for both pods and nodes works, even for vpc native clusters. And has the added benefit of routable cluster IPs, but in our experience it does cause issues when clusters grow.

@dijitali
Copy link

dijitali commented Aug 8, 2019

@one1zero1one - we've just encountered the same issue and quickly reached IP exhaustion as a result.

Given the implications of these ranges being set too small, it'd certainly be good to be able to configure them as recommended in the documentation. I think it might be worth re-opening this issue as I expect other people will have the same requirement?

@autero1
Copy link
Contributor

autero1 commented Aug 8, 2019

Hi @one1zero1one @dijitali

I'm re-opening the issue. PR very welcome!

@autero1 autero1 reopened this Aug 8, 2019
@autero1 autero1 added enhancement New feature or request help wanted labels Aug 8, 2019
@dijitali
Copy link

dijitali commented Aug 8, 2019

Ta @autero1. I'm putting something together now (although @one1zero1one, if you already have something in a state to PR then lemme know!).

@dijitali
Copy link

dijitali commented Aug 8, 2019

I wasn't too sure about the approach suggested above. I think it would mean relying on the gruntwork-io/terraform-google-network module allocating some of the subnet ranges but then uses functionality in the google_container_cluster.ip_allocation_policy to auto-create additional secondary_ip_ranges? That seems a bit complex to manage the IP range allocation in two places?

I'm wondering if a cleaner approach would be to have the gruntwork-io/terraform-google-network module create all of the VPC network and subnet ranges and just reference the additional secondary_ip_range by name in this module with a new var.services_secondary_range_name variable?:

resource "google_container_cluster" "cluster" {
...
  ip_allocation_policy {
    // Choose the range, but let GCP pick the IPs within the range
    cluster_secondary_range_name  = var.cluster_secondary_range_name
    services_secondary_range_name = var.services_secondary_range_name
  }
...
}

So instead the google_compute_subnetwork.vpc_subnetwork_public and vpc_subnetwork_public would generate two secondary_ip_range blocks.

@autero1 - are you happy with this approach before I get started? Since we'd need to make the changes in gruntwork-io/terraform-google-network module backwards compatible, it might use the new tf 0.12 dynamic nested block functionality to still create a single secondary_ip_range by default but also support this functionality.

@one1zero1one
Copy link
Author

@dijitali unfortunately the path we took is highly opinionated (as described in the comment #52 (comment)) because essentially we went for asking GKE to create the subnets for us as it sees fit anywhere across the VPC network, and then nat them in the VPC module. Reason for that is the amount of clusters we need, but for most shops your approach seems good.

@autero1
Copy link
Contributor

autero1 commented Aug 13, 2019

I'm wondering if a cleaner approach would be to have the gruntwork-io/terraform-google-network module create all of the VPC network and subnet ranges and just reference the additional secondary_ip_range by name in this module with a new var.services_secondary_range_name variable?:

@dijitali I think adding a new variable services_secondary_range_name is a sane approach. We can make the change backwards compatible by assigning null default value to the var.services_secondary_range_name and keep on using var.cluster_secondary_range_name for services_secondary_range_name if no value provided.

@Eugst
Copy link
Contributor

Eugst commented Mar 30, 2020

@autero1 I think i realised you thoughts correctly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants