Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319

GeorgianaElena · 2023-10-24T14:57:42Z

Follow-up to #3304
Fixes #3256

TODO

Run terraform plan & apply for:

2i2c-uk.tfvars

terraform plan -var-file=projects/callysto.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_container_node_pool.notebook["n2-highmem-16"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "two-eye-two-see-uk-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "europe-west2"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-16"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "two-eye-two-see-uk"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "two-eye-two-see-uk-cluster-sa@two-eye-two-see-uk.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "two-eye-two-see-uk-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "europe-west2"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "two-eye-two-see-uk"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "two-eye-two-see-uk-cluster-sa@two-eye-two-see-uk.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["user"] will be updated in-place
  ~ resource "google_container_node_pool" "notebook" {
        id                          = "projects/two-eye-two-see-uk/locations/europe-west2/clusters/two-eye-two-see-uk-cluster/nodePools/nb-user"
        name                        = "nb-user"
        # (9 unchanged attributes hidden)

      ~ autoscaling {
          ~ max_node_count       = 20 -> 100
            # (4 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 2 to add, 1 to change, 0 to destroy.

callysto.tfvars

terraform plan -var-file=projects/callysto.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_container_node_pool.notebook["n2-highmem-16"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "callysto-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "northamerica-northeast1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-16"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "callysto-202316"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "callysto-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "northamerica-northeast1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "callysto-202316"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["user"] will be updated in-place
  ~ resource "google_container_node_pool" "notebook" {
        id                          = "projects/callysto-202316/locations/northamerica-northeast1/clusters/callysto-cluster/nodePools/nb-user"
        name                        = "nb-user"
        # (9 unchanged attributes hidden)

      ~ autoscaling {
          ~ max_node_count       = 20 -> 100
            # (4 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 2 to add, 1 to change, 0 to destroy.

cloudbank.tfvars

terraform plan -var-file=projects/cloudbank.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_container_node_pool.notebook["n2-highmem-16"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "cb-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-16"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "cb-1003-1696"
      + version                     = "1.26.4-gke.1400"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "cb-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "cb-1003-1696"
      + version                     = "1.26.4-gke.1400"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["user"] will be updated in-place
  ~ resource "google_container_node_pool" "notebook" {
        id                          = "projects/cb-1003-1696/locations/us-central1-b/clusters/cb-cluster/nodePools/nb-user"
        name                        = "nb-user"
        # (9 unchanged attributes hidden)

      ~ autoscaling {
          ~ max_node_count       = 20 -> 100
            # (4 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 2 to add, 1 to change, 0 to destroy.

hhmi.tfvars

terraform plan -var-file=projects/hhmi.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_container_node_pool.notebook["n2-highmem-4"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "hhmi-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-west2"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-4"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "hhmi-398911"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-4"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "hhmi-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-west2"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "hhmi-398911"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

linked-earth.tfvars

terraform plan -var-file=projects/linked-earth.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "linked-earth-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "linked-earth-hubs"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "linked-earth-cluster-sa@linked-earth-hubs.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

m2lines.tfvars

terraform plan -var-file=projects/m2lines.tfvars

  # google_container_node_pool.notebook["n2-highmem-16"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "m2lines-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-16"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "m2lines-hub"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-4"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "m2lines-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-4"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "m2lines-hub"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-4"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "m2lines-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "m2lines-hub"
      + version                     = "1.27.4-gke.900"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 3 to add, 0 to change, 0 to destroy.

pilot-hubs.tfvars

terraform plan -var-file=projects/pilot-hubs.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_container_node_pool.notebook["n2-highmem-16"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "pilot-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-16"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "two-eye-two-see"
      + version                     = "1.26.4-gke.1400"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "pilot-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "two-eye-two-see"
      + version                     = "1.26.4-gke.1400"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["user"] will be updated in-place
  ~ resource "google_container_node_pool" "notebook" {
        id                          = "projects/two-eye-two-see/locations/us-central1-b/clusters/pilot-hubs-cluster/nodePools/nb-user"
        name                        = "nb-user"
        # (9 unchanged attributes hidden)

      ~ autoscaling {
          ~ max_node_count       = 20 -> 100
            # (4 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 2 to add, 1 to change, 0 to destroy.

leap.tfvars !!!

terraform plan -var-file=projects/leap.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_container_cluster.cluster will be updated in-place
  ~ resource "google_container_cluster" "cluster" {
        id                          = "projects/leap-pangeo/locations/us-central1/clusters/leap-cluster"
        name                        = "leap-cluster"
        # (27 unchanged attributes hidden)

      ~ cluster_autoscaling {
          ~ enabled             = true -> false
            # (1 unchanged attribute hidden)

          - resource_limits {
              - maximum       = 26112 -> null
              - minimum       = 1 -> null
              - resource_type = "memory" -> null
            }
          - resource_limits {
              - maximum       = 3264 -> null
              - minimum       = 1 -> null
              - resource_type = "cpu" -> null
            }
          - resource_limits {
              - maximum       = 1024 -> null
              - minimum       = 1 -> null
              - resource_type = "nvidia-tesla-a100" -> null
            }
          - resource_limits {
              - maximum       = 1024 -> null
              - minimum       = 1 -> null
              - resource_type = "nvidia-tesla-k80" -> null
            }
          - resource_limits {
              - maximum       = 1024 -> null
              - minimum       = 1 -> null
              - resource_type = "nvidia-tesla-p100" -> null
            }
          - resource_limits {
              - maximum       = 1024 -> null
              - minimum       = 1 -> null
              - resource_type = "nvidia-tesla-p4" -> null
            }
          - resource_limits {
              - maximum       = 1024 -> null
              - minimum       = 1 -> null
              - resource_type = "nvidia-tesla-t4" -> null
            }
          - resource_limits {
              - maximum       = 1024 -> null
              - minimum       = 1 -> null
              - resource_type = "nvidia-tesla-v100" -> null
            }

            # (1 unchanged block hidden)
        }

        # (23 unchanged blocks hidden)
    }

  # google_container_node_pool.notebook["n2-highmem-4"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "leap-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-4"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "leap-pangeo"
      + version                     = "1.25.6-gke.1000"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-4"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "leap-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "leap-pangeo"
      + version                     = "1.25.6-gke.1000"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 2 to add, 1 to change, 0 to destroy.

qcl.tfvars !!!!

terraform plan -var-file=projects/qcl.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # google_container_cluster.cluster will be updated in-place
  ~ resource "google_container_cluster" "cluster" {
        id                          = "projects/qcl-hub/locations/europe-west1/clusters/qcl-cluster"
      + min_master_version          = "1.25.10-gke.2700"
        name                        = "qcl-cluster"
        # (26 unchanged attributes hidden)

        # (26 unchanged blocks hidden)
    }

  # google_container_node_pool.notebook["huge"] will be updated in-place
  ~ resource "google_container_node_pool" "notebook" {
        id                          = "projects/qcl-hub/locations/europe-west1/clusters/qcl-cluster/nodePools/nb-huge"
        name                        = "nb-huge"
      ~ version                     = "1.24.11-gke.1000" -> "1.24.9-gke.3200"
        # (8 unchanged attributes hidden)

        # (5 unchanged blocks hidden)
    }

  # google_container_node_pool.notebook["large"] will be updated in-place
  ~ resource "google_container_node_pool" "notebook" {
        id                          = "projects/qcl-hub/locations/europe-west1/clusters/qcl-cluster/nodePools/nb-large"
        name                        = "nb-large"
      ~ version                     = "1.24.11-gke.1000" -> "1.24.9-gke.3200"
        # (8 unchanged attributes hidden)

        # (5 unchanged blocks hidden)
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "qcl-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "europe-west1"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "qcl-hub"
      + version                     = "1.24.9-gke.3200"

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "[email protected]"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 1 to add, 3 to change, 0 to destroy.

Changes to Outputs:
  ~ regular_channel_latest_k8s_versions = {
      ~ "1."    = "1.27.3-gke.1700" -> "1.27.4-gke.900"
      ~ "1.24." = "1.24.15-gke.1700" -> "1.24.16-gke.500"
      ~ "1.25." = "1.25.11-gke.1700" -> "1.25.12-gke.500"
      ~ "1.26." = "1.26.6-gke.1700" -> "1.26.7-gke.500"
      ~ "1.27." = "1.27.3-gke.1700" -> "1.27.4-gke.900"
    }

pangeo-hubs.tfvars

terraform plan -var-file=projects/pangeo-hubs.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform planned the following actions, but then encountered a problem:

  # google_container_node_pool.notebook["n2-highmem-16"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "pangeo-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-16"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "pangeo-integration-te-3eea"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-4"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "pangeo-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-4"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "pangeo-integration-te-3eea"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-4"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

  # google_container_node_pool.notebook["n2-highmem-64"] will be created
  + resource "google_container_node_pool" "notebook" {
      + cluster                     = "pangeo-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "nb-n2-highmem-64"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "pangeo-integration-te-3eea"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "hub.jupyter.org/node-purpose" = "user"
              + "k8s.dask.org/node-purpose"    = "scheduler"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-64"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "hub.jupyter.org_dedicated"
                  + value  = "user"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 3 to add, 0 to change, 0 to destroy.
╷
│ Warning: Failed to decode resource from state
│ 
│ Error decoding "google_monitoring_alert_policy.disk_space_full_alert" from prior state: unsupported attribute "condition_prometheus_query_language"
╵
╷
│ Error: Failed to get the data key required to decrypt the SOPS file.
│ 
│ Group 0: FAILED
│   projects/two-eye-two-see/locations/global/keyRings/sops-keys/cryptoKeys/similar-hubs: FAILED
│     - | Error decrypting key: googleapi: Error 403: Permission
│       | 'cloudkms.cryptoKeyVersions.useToDecrypt' denied on resource
│       | 'projects/two-eye-two-see/locations/global/keyRings/sops-keys/cryptoKeys/similar-hubs'
│       | (or it may not exist)., forbidden
│ 
│ Recovery failed because no master key was able to decrypt the file. In
│ order for SOPS to recover the file, at least one key has to be successful,
│ but none were.
│ 
│   with data.sops_file.pagerduty_service_integration_keys,
│   on pagerduty.tf line 13, in data "sops_file" "pagerduty_service_integration_keys":
│   13: data "sops_file" "pagerduty_service_integration_keys" {
│

consideRatio

We have quite a few references to what "large" etc means ;)

Due to that, I favor the non-relative naming like in general, like here:

I don't mind not taking action about this now if changes are made already, but its a slight preference.

GeorgianaElena · 2023-10-25T13:26:05Z

@consideRatio, you're totally right. I did went with using more generic names like: n2-highmem-4", but then I saw that we call them "small", "medium" and "large" in the template file, so I figured that going with these names will minimize the number of disrupting renaming we will do in the future (assuming we might want to make sure all these machines are available under these names).

Happy to change them back to generic names, including the template. It did feel awkward to call that one larger 😅

consideRatio · 2023-10-26T09:57:31Z

Notes form sync chat.

Update template files to use absolute naming
Update some but not all to use absolute naming side by side with previous naming
(future, doesn't have to be by @GeorgianaElena in this PR) Create a followup issue to help us progressively work towards absolute naming of user nodes in GKE and AKS

consideRatio

Wieee thank you for working this @GeorgianaElena!!

There were some style changes to not adopt trailing commas, and I looked into this to conclude the autoformatting by terraform fmt doesn't help with enforcing that. Since its not enforced by autoformatting and requires manual thought, we shouldn't bother thinking about it further I think!

Let's go for a merge!

consideRatio · 2023-10-28T17:49:08Z

Out of scope for this PR, but related: I got an idea on how we can help nudge a transition of things over time btw! Next to each thing we want to change in each terraform/eksctl file - we inline a comment like "FIXME: Update this to ... when given the chance".

Like that we provide a quite easy to resolve fixme note when something else is done anyhow by someone for example doing k8s upgrade maintenance.

GeorgianaElena · 2023-10-30T13:22:24Z

Thank you @consideRatio! I've just added a commit to add the suggested fixme comments and the trailing commas and will now start terraform plan + apply the chnages.

consideRatio

Wiee nice!

GeorgianaElena · 2023-10-30T14:29:43Z

Update

I have ran terraform plan & apply for all but three clusters.

In the top comment I pasted the terraform plan output of all of them.

The ones without a check I did not ran terraform apply for because of the changes it wished to make to the current infra. Summary:

1. leap

Leap appears to have cluster_autoscaling enabled, which is not reflected in its terraform config

2. qcl

It seems it wishes to make some updates to the existing nodepools. Don't undestand why?

3. pangeo-hubs

At the end of the plan output, it says:

│ Warning: Failed to decode resource from state
│ 
│ Error decoding "google_monitoring_alert_policy.disk_space_full_alert" from prior state: unsupported attribute "condition_prometheus_query_language"
╵
╷
│ Error: Failed to get the data key required to decrypt the SOPS file.
│ 
│ Group 0: FAILED
│   projects/two-eye-two-see/locations/global/keyRings/sops-keys/cryptoKeys/similar-hubs: FAILED
│     - | Error decrypting key: googleapi: Error 403: Permission
│       | 'cloudkms.cryptoKeyVersions.useToDecrypt' denied on resource
│       | 'projects/two-eye-two-see/locations/global/keyRings/sops-keys/cryptoKeys/similar-hubs'
│       | (or it may not exist)., forbidden
│ 
│ Recovery failed because no master key was able to decrypt the file. In
│ order for SOPS to recover the file, at least one key has to be successful,
│ but none were.
│ 
│   with data.sops_file.pagerduty_service_integration_keys,
│   on pagerduty.tf line 13, in data "sops_file" "pagerduty_service_integration_keys":
│   13: data "sops_file" "pagerduty_service_integration_keys" {
│

sgibson91 · 2023-10-30T14:35:12Z

@GeorgianaElena for pangeo hubs, you'll have to log into the gcloud cli using your Columbia email I think

GeorgianaElena · 2023-10-30T14:42:16Z

@GeorgianaElena for pangeo hubs, you'll have to log into the gcloud cli using your Columbia email I think

Thanks @sgibson91. I did that, but I think the issuse is that my columbia account doesn't have permissions to access the sops decryption key which is stored in the two-eye-two-see gcloud project. Will manually add myself there and make a note in the terraform file if this fixes it.

Update

Yes, granting my columbia account kms encryptor/decryptor permissions in the two-eye-two-see project fixed the issue for pangeo-hubs and I was able to terraform apply

Remaining clusters with terraform apply issues: leap and qcl

consideRatio · 2023-10-30T14:59:47Z

LEAP:

Leap appears to have cluster_autoscaling enabled, which is not reflected in its terraform config

I think node auto-provisioning has been enabled as part of Yuvi trialing things in #3287, and that relies on adjusting the GKE managed cluster autoscaler which isn't running inside the k8s cluster as it does on EKS.

QCL:

It seems it wishes to make some updates to the existing nodepools. Don't undestand why?

What kind of updates? Are they related to k8s node versions? I then suspect its a remnant of a k8s cluster upgrade, where node pools wasn't updated as part of the k8s api-server being upgraded.

consideRatio · 2023-10-30T15:01:08Z

OMG amazing summary of your actions in the PR description @GeorgianaElena, looking now!

Hmmm, so node pools are being downgraded... "1.24.11-gke.1000" -> "1.24.9-gke.3200"

consideRatio · 2023-10-30T15:13:52Z

@GeorgianaElena I think its fine that we update large and huge in place because they are not used currently. So they can be re-created / updated without issues to get the same k8s version pinned. Apparently they have a more modern k8s version than most other nodes.

This cluster was created without pinned k8s versions for the nodes and hasn't been upgraded to align all nodes version since the pinning was introduced in the terraform config.

consideRatio · 2023-10-30T15:31:19Z

@GeorgianaElena I went for the QCL node pools changes while they remained inactive - QCL complete!

consideRatio · 2023-10-30T15:40:01Z

@GeorgianaElena I'm quite confident that #3287 was causing the LEAP issues. I suggest we simply merge this as it is for now though as I don't think its in scope for this PR to resolve it.

GeorgianaElena · 2023-10-31T07:51:51Z

@GeorgianaElena I went for the QCL node pools changes while they remained inactive - QCL complete!

Amazing! Thank you @consideRatio <3

@GeorgianaElena I'm quite confident that #3287 was causing the LEAP issues. I suggest we simply merge this as it is for now though as I don't think its in scope for this PR to resolve it.

Thanks @consideRatio! Then I will merge this now since almost everything was terraform applied.

Make the three default instance types available to all gcp clusters

6a7c9b4

GeorgianaElena changed the title ~~Mak all clusters support the instance types 4, 16, and 64 CPU highmem nodes~~ Make all clusters support the instance types 4, 16, and 64 CPU highmem nodes Oct 24, 2023

github-actions bot assigned GeorgianaElena Oct 24, 2023

GeorgianaElena mentioned this pull request Oct 24, 2023

Q4 Reduced workload goal - Oct 18 Sprint 1 tracking issue #3318

Closed

consideRatio reviewed Oct 24, 2023

View reviewed changes

GeorgianaElena added 5 commits October 27, 2023 11:20

Update the templates with more specific naming of instances

bb8c803

Switch from generic naming of instances to new proposed names

a9cf27a

Don't rename pre-existing node pools

334dbe5

Rename node pools per instance type in one more place

bbc6f68

Rename node pools per instance type in one more place

f444764

GeorgianaElena changed the title ~~Make all clusters support the instance types 4, 16, and 64 CPU highmem nodes~~ Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes Oct 27, 2023

GeorgianaElena marked this pull request as ready for review October 27, 2023 08:29

GeorgianaElena requested a review from a team as a code owner October 27, 2023 08:29

consideRatio approved these changes Oct 27, 2023

View reviewed changes

Add fixme comments and some missing trailing commas

36cd208

consideRatio approved these changes Oct 30, 2023

View reviewed changes

GeorgianaElena merged commit dd8760f into 2i2c-org:master Oct 31, 2023
1 check passed

GeorgianaElena deleted the add-default-machine-types branch October 31, 2023 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319

Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319

GeorgianaElena commented Oct 24, 2023 •

edited by consideRatio

Loading

consideRatio left a comment

GeorgianaElena commented Oct 25, 2023

consideRatio commented Oct 26, 2023

consideRatio left a comment •

edited

Loading

consideRatio commented Oct 28, 2023 •

edited

Loading

GeorgianaElena commented Oct 30, 2023

consideRatio left a comment

GeorgianaElena commented Oct 30, 2023

sgibson91 commented Oct 30, 2023

GeorgianaElena commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023

consideRatio commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023

consideRatio commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023

GeorgianaElena commented Oct 31, 2023

Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319

Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319

Conversation

GeorgianaElena commented Oct 24, 2023 • edited by consideRatio Loading

TODO

consideRatio left a comment

Choose a reason for hiding this comment

GeorgianaElena commented Oct 25, 2023

consideRatio commented Oct 26, 2023

consideRatio left a comment • edited Loading

Choose a reason for hiding this comment

consideRatio commented Oct 28, 2023 • edited Loading

GeorgianaElena commented Oct 30, 2023

consideRatio left a comment

Choose a reason for hiding this comment

GeorgianaElena commented Oct 30, 2023

Update

1. leap

2. qcl

3. pangeo-hubs

sgibson91 commented Oct 30, 2023

GeorgianaElena commented Oct 30, 2023 • edited Loading

Update

Remaining clusters with terraform apply issues: leap and qcl

consideRatio commented Oct 30, 2023

consideRatio commented Oct 30, 2023 • edited Loading

consideRatio commented Oct 30, 2023

consideRatio commented Oct 30, 2023 • edited Loading

consideRatio commented Oct 30, 2023

GeorgianaElena commented Oct 31, 2023

GeorgianaElena commented Oct 24, 2023 •

edited by consideRatio

Loading

consideRatio left a comment •

edited

Loading

consideRatio commented Oct 28, 2023 •

edited

Loading

GeorgianaElena commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023 •

edited

Loading