gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024

consideRatio · 2023-08-24T08:39:13Z

pangeo-hubs is the last 2i2c cluster that has multiple dask worker node types, so with this terraform applied and merged we can fix #2687.

If we get all clusters to use a single node type with 16 CPU and 128 GB of memory (r5.4xlarge / n2-highmem-16), it enables us to provide good defaults for users using dask-gateway when they decide on how powerful their workers are to be. This is planned in #2687.

I'm not able to get this all the way through myself though as I lack access to the infrastructure.

Action plan

Someone else approves this PR
I check from time to time if there are dask worker nodes active, and when they aren't asks for help
Someone else applies this terraform change
I merge the PR

Current activity

gke-pangeo-hubs-cluster-dask-medium-552f8a1e-6ndl   Ready    <none>   16h     v1.26.4-gke.1400
gke-pangeo-hubs-cluster-dask-medium-552f8a1e-ssll   Ready    <none>   21h     v1.26.4-gke.1400
gke-pangeo-hubs-cluster-dask-medium-552f8a1e-z6wp   Ready    <none>   3h23m   v1.26.4-gke.1400

Grafana dashboard at https://grafana.gcp.pangeo.2i2c.cloud is down because prometheus is crashing, so I can't understand if there is a history of always having dask worker nodes active or similar. I can get a brief response before it crashes, but it indicates no data is available anyhow...

support-prometheus-server-7c4f454847-6h9h6          2/2     Running   21 (2m19s ago)   17d

consideRatio · 2023-08-24T09:06:54Z

The apply steps are something like below I think:

# A GCP account with permissions to the GCP project columbia at
# https://console.cloud.google.com/iam-admin/iam?project=columbia
# is required, and we don't have access with our @2i2c.org accounts
gcloud auth login --update-adc

gh pr checkout 3024
cd terraform/gcp
rm -rf .terraform

terraform init -backend-config backends/pangeo-backend.hcl
terraform workspace list
terraform workspace select pangeo-hubs

terraform apply --var-file projects/pangeo-hubs.tfvars

GeorgianaElena · 2023-08-25T06:53:32Z

terraform plan output below:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  - destroy
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # google_container_node_pool.core must be replaced
-/+ resource "google_container_node_pool" "core" {
      ~ id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/core-pool" -> (known after apply)
      ~ instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-core-pool-c8492309-grp",
        ] -> (known after apply)
      ~ managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-core-pool-c8492309-grp",
        ] -> (known after apply)
      ~ max_pods_per_node           = 110 -> (known after apply)
        name                        = "core-pool"
      + name_prefix                 = (known after apply)
      ~ node_count                  = 2 -> (known after apply)
      ~ node_locations              = [
          - "us-central1-b",
        ] -> (known after apply)
      + operation                   = (known after apply)
      ~ version                     = "1.26.4-gke.1400" -> (known after apply)
        # (4 unchanged attributes hidden)

      ~ autoscaling {
          ~ location_policy      = "BALANCED" -> (known after apply)
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
            # (2 unchanged attributes hidden)
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      ~ node_config {
          ~ disk_type         = "pd-balanced" -> (known after apply)
          ~ guest_accelerator = [] -> (known after apply)
          ~ image_type        = "COS_CONTAINERD" -> (known after apply)
          ~ local_ssd_count   = 0 -> (known after apply)
          ~ machine_type      = "n2-highmem-8" -> "n2-highmem-4" # forces replacement
          ~ metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> (known after apply)
          + min_cpu_platform  = (known after apply)
          - resource_labels   = {} -> null
            tags              = []
          ~ taint             = [] -> (known after apply)
            # (7 unchanged attributes hidden)

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }

        # (1 unchanged block hidden)
    }

  # google_container_node_pool.dask_worker["large"] will be destroyed
  # (because key ["large"] is not in for_each map)
  - resource "google_container_node_pool" "dask_worker" {
      - cluster                     = "pangeo-hubs-cluster" -> null
      - id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-large" -> null
      - initial_node_count          = 0 -> null
      - instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-dask-large-0a156e10-grp",
        ] -> null
      - location                    = "us-central1-b" -> null
      - managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-dask-large-0a156e10-grp",
        ] -> null
      - max_pods_per_node           = 110 -> null
      - name                        = "dask-large" -> null
      - node_count                  = 0 -> null
      - node_locations              = [
          - "us-central1-b",
        ] -> null
      - project                     = "pangeo-integration-te-3eea" -> null
      - version                     = "1.26.4-gke.1400" -> null

      - autoscaling {
          - location_policy      = "ANY" -> null
          - max_node_count       = 100 -> null
          - min_node_count       = 0 -> null
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
        }

      - management {
          - auto_repair  = true -> null
          - auto_upgrade = false -> null
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-balanced" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {
              - "k8s.dask.org/node-purpose" = "worker"
            } -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "n1-standard-16" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/cloud-platform",
            ] -> null
          - preemptible       = true -> null
          - resource_labels   = {} -> null
          - service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "k8s.dask.org_dedicated"
                  - value  = "worker"
                },
            ] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }
    }

  # google_container_node_pool.dask_worker["medium"] will be destroyed
  # (because key ["medium"] is not in for_each map)
  - resource "google_container_node_pool" "dask_worker" {
      - cluster                     = "pangeo-hubs-cluster" -> null
      - id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-medium" -> null
      - initial_node_count          = 0 -> null
      - instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-dask-medium-552f8a1e-grp",
        ] -> null
      - location                    = "us-central1-b" -> null
      - managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-dask-medium-552f8a1e-grp",
        ] -> null
      - max_pods_per_node           = 110 -> null
      - name                        = "dask-medium" -> null
      - node_count                  = 0 -> null
      - node_locations              = [
          - "us-central1-b",
        ] -> null
      - project                     = "pangeo-integration-te-3eea" -> null
      - version                     = "1.26.4-gke.1400" -> null

      - autoscaling {
          - location_policy      = "ANY" -> null
          - max_node_count       = 100 -> null
          - min_node_count       = 0 -> null
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
        }

      - management {
          - auto_repair  = true -> null
          - auto_upgrade = false -> null
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-balanced" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {
              - "k8s.dask.org/node-purpose" = "worker"
            } -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "n1-standard-8" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/cloud-platform",
            ] -> null
          - preemptible       = true -> null
          - resource_labels   = {} -> null
          - service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "k8s.dask.org_dedicated"
                  - value  = "worker"
                },
            ] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }
    }

  # google_container_node_pool.dask_worker["small"] will be destroyed
  # (because key ["small"] is not in for_each map)
  - resource "google_container_node_pool" "dask_worker" {
      - cluster                     = "pangeo-hubs-cluster" -> null
      - id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-small" -> null
      - initial_node_count          = 0 -> null
      - instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-dask-small-ab203ba0-grp",
        ] -> null
      - location                    = "us-central1-b" -> null
      - managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-dask-small-ab203ba0-grp",
        ] -> null
      - max_pods_per_node           = 110 -> null
      - name                        = "dask-small" -> null
      - node_count                  = 0 -> null
      - node_locations              = [
          - "us-central1-b",
        ] -> null
      - project                     = "pangeo-integration-te-3eea" -> null
      - version                     = "1.26.4-gke.1400" -> null

      - autoscaling {
          - location_policy      = "ANY" -> null
          - max_node_count       = 100 -> null
          - min_node_count       = 0 -> null
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
        }

      - management {
          - auto_repair  = true -> null
          - auto_upgrade = false -> null
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-balanced" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {
              - "k8s.dask.org/node-purpose" = "worker"
            } -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "n1-standard-4" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/cloud-platform",
            ] -> null
          - preemptible       = true -> null
          - resource_labels   = {} -> null
          - service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "k8s.dask.org_dedicated"
                  - value  = "worker"
                },
            ] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }
    }

  # google_container_node_pool.dask_worker["worker"] will be created
  + resource "google_container_node_pool" "dask_worker" {
      + cluster                     = "pangeo-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "dask-worker"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "pangeo-integration-te-3eea"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "k8s.dask.org/node-purpose" = "worker"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = true
          + service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "k8s.dask.org_dedicated"
                  + value  = "worker"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 2 to add, 0 to change, 4 to destroy.

Changes to Outputs:
  ~ regular_channel_latest_k8s_versions = {
      ~ "1."    = "1.27.2-gke.1200" -> "1.27.3-gke.1700"
      - "1.22." = "1.22.17-gke.11400"
      - "1.23." = "1.23.17-gke.5600"
      ~ "1.24." = "1.24.13-gke.2500" -> "1.24.15-gke.1700"
      ~ "1.25." = "1.25.9-gke.2300" -> "1.25.11-gke.1700"
      + "1.26." = "1.26.6-gke.1700"
      + "1.27." = "1.27.3-gke.1700"
    }

consideRatio · 2023-08-25T08:33:02Z

Thank you @GeorgianaElena for working this!!!

Hmmm, I don't get why is the core node pool replaced. Only the dask worker node pools are meant to be. If you can get only the dask-worker node pools destroyed, this can be applied in my mind.

consideRatio · 2023-08-25T08:36:23Z

If you checkout the master branch, does terraform apply cause a replacement of the core node pools as well? It could be that we have some state mismatch unrelated to this PRs change.

Ah... that is the case!

I see the core node pool type is n2-highmem-8 but our config sais n2-highmem-4. Maybe you can try by setting it to n2-highmem-8 instead then?

GeorgianaElena · 2023-08-25T08:47:44Z

The plan looks to be updating dask nodepools now, as expected:


Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  - destroy

Terraform will perform the following actions:

  # google_container_node_pool.dask_worker["large"] will be destroyed
  # (because key ["large"] is not in for_each map)
  - resource "google_container_node_pool" "dask_worker" {
      - cluster                     = "pangeo-hubs-cluster" -> null
      - id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-large" -> null
      - initial_node_count          = 0 -> null
      - instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-dask-large-0a156e10-grp",
        ] -> null
      - location                    = "us-central1-b" -> null
      - managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-dask-large-0a156e10-grp",
        ] -> null
      - max_pods_per_node           = 110 -> null
      - name                        = "dask-large" -> null
      - node_count                  = 0 -> null
      - node_locations              = [
          - "us-central1-b",
        ] -> null
      - project                     = "pangeo-integration-te-3eea" -> null
      - version                     = "1.26.4-gke.1400" -> null

      - autoscaling {
          - location_policy      = "ANY" -> null
          - max_node_count       = 100 -> null
          - min_node_count       = 0 -> null
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
        }

      - management {
          - auto_repair  = true -> null
          - auto_upgrade = false -> null
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-balanced" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {
              - "k8s.dask.org/node-purpose" = "worker"
            } -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "n1-standard-16" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/cloud-platform",
            ] -> null
          - preemptible       = true -> null
          - resource_labels   = {} -> null
          - service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "k8s.dask.org_dedicated"
                  - value  = "worker"
                },
            ] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }
    }

  # google_container_node_pool.dask_worker["medium"] will be destroyed
  # (because key ["medium"] is not in for_each map)
  - resource "google_container_node_pool" "dask_worker" {
      - cluster                     = "pangeo-hubs-cluster" -> null
      - id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-medium" -> null
      - initial_node_count          = 0 -> null
      - instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-dask-medium-552f8a1e-grp",
        ] -> null
      - location                    = "us-central1-b" -> null
      - managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-dask-medium-552f8a1e-grp",
        ] -> null
      - max_pods_per_node           = 110 -> null
      - name                        = "dask-medium" -> null
      - node_count                  = 0 -> null
      - node_locations              = [
          - "us-central1-b",
        ] -> null
      - project                     = "pangeo-integration-te-3eea" -> null
      - version                     = "1.26.4-gke.1400" -> null

      - autoscaling {
          - location_policy      = "ANY" -> null
          - max_node_count       = 100 -> null
          - min_node_count       = 0 -> null
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
        }

      - management {
          - auto_repair  = true -> null
          - auto_upgrade = false -> null
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-balanced" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {
              - "k8s.dask.org/node-purpose" = "worker"
            } -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "n1-standard-8" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/cloud-platform",
            ] -> null
          - preemptible       = true -> null
          - resource_labels   = {} -> null
          - service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "k8s.dask.org_dedicated"
                  - value  = "worker"
                },
            ] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }
    }

  # google_container_node_pool.dask_worker["small"] will be destroyed
  # (because key ["small"] is not in for_each map)
  - resource "google_container_node_pool" "dask_worker" {
      - cluster                     = "pangeo-hubs-cluster" -> null
      - id                          = "projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-small" -> null
      - initial_node_count          = 0 -> null
      - instance_group_urls         = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroupManagers/gke-pangeo-hubs-cluster-dask-small-ab203ba0-grp",
        ] -> null
      - location                    = "us-central1-b" -> null
      - managed_instance_group_urls = [
          - "https://www.googleapis.com/compute/v1/projects/pangeo-integration-te-3eea/zones/us-central1-b/instanceGroups/gke-pangeo-hubs-cluster-dask-small-ab203ba0-grp",
        ] -> null
      - max_pods_per_node           = 110 -> null
      - name                        = "dask-small" -> null
      - node_count                  = 0 -> null
      - node_locations              = [
          - "us-central1-b",
        ] -> null
      - project                     = "pangeo-integration-te-3eea" -> null
      - version                     = "1.26.4-gke.1400" -> null

      - autoscaling {
          - location_policy      = "ANY" -> null
          - max_node_count       = 100 -> null
          - min_node_count       = 0 -> null
          - total_max_node_count = 0 -> null
          - total_min_node_count = 0 -> null
        }

      - management {
          - auto_repair  = true -> null
          - auto_upgrade = false -> null
        }

      - network_config {
          - create_pod_range     = false -> null
          - enable_private_nodes = false -> null
          - pod_ipv4_cidr_block  = "10.8.0.0/14" -> null
          - pod_range            = "gke-pangeo-hubs-cluster-pods-14554e9f" -> null
        }

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-balanced" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {
              - "k8s.dask.org/node-purpose" = "worker"
            } -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "n1-standard-4" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/cloud-platform",
            ] -> null
          - preemptible       = true -> null
          - resource_labels   = {} -> null
          - service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "k8s.dask.org_dedicated"
                  - value  = "worker"
                },
            ] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = false -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - upgrade_settings {
          - max_surge       = 1 -> null
          - max_unavailable = 0 -> null
          - strategy        = "SURGE" -> null
        }
    }

  # google_container_node_pool.dask_worker["worker"] will be created
  + resource "google_container_node_pool" "dask_worker" {
      + cluster                     = "pangeo-hubs-cluster"
      + id                          = (known after apply)
      + initial_node_count          = 0
      + instance_group_urls         = (known after apply)
      + location                    = "us-central1-b"
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = "dask-worker"
      + name_prefix                 = (known after apply)
      + node_count                  = (known after apply)
      + node_locations              = (known after apply)
      + operation                   = (known after apply)
      + project                     = "pangeo-integration-te-3eea"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 100
          + min_node_count  = 0
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = false
        }

      + node_config {
          + disk_size_gb      = (known after apply)
          + disk_type         = "pd-balanced"
          + guest_accelerator = (known after apply)
          + image_type        = (known after apply)
          + labels            = {
              + "k8s.dask.org/node-purpose" = "worker"
            }
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = "n2-highmem-16"
          + metadata          = (known after apply)
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = true
          + service_account   = "pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com"
          + spot              = false
          + tags              = []
          + taint             = [
              + {
                  + effect = "NO_SCHEDULE"
                  + key    = "k8s.dask.org_dedicated"
                  + value  = "worker"
                },
            ]

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }
    }

Plan: 1 to add, 0 to change, 3 to destroy.

Changes to Outputs:
  ~ regular_channel_latest_k8s_versions = {
      ~ "1."    = "1.27.2-gke.1200" -> "1.27.3-gke.1700"
      - "1.22." = "1.22.17-gke.11400"
      - "1.23." = "1.23.17-gke.5600"
      ~ "1.24." = "1.24.13-gke.2500" -> "1.24.15-gke.1700"
      ~ "1.25." = "1.25.9-gke.2300" -> "1.25.11-gke.1700"
      + "1.26." = "1.26.6-gke.1700"
      + "1.27." = "1.27.3-gke.1700"
    }

github-actions · 2023-08-25T08:48:24Z

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider	Cluster Name	Upgrade Support?	Upgrade Staging?	Reason for Staging Redeploy
aws	catalystproject-africa	No	Yes	Core infrastructure has been modified
aws	smithsonian	No	Yes	Core infrastructure has been modified
kubeconfig	utoronto	No	Yes	Core infrastructure has been modified
aws	openscapes	No	Yes	Core infrastructure has been modified
aws	carbonplan	No	Yes	Core infrastructure has been modified
aws	ubc-eoas	No	Yes	Core infrastructure has been modified
gcp	qcl	No	Yes	Core infrastructure has been modified
gcp	linked-earth	No	Yes	Core infrastructure has been modified
aws	jupyter-meets-the-earth	No	Yes	Core infrastructure has been modified
gcp	awi-ciroh	No	Yes	Core infrastructure has been modified
aws	2i2c-aws-us	No	Yes	Core infrastructure has been modified
gcp	2i2c-uk	No	Yes	Core infrastructure has been modified
gcp	m2lines	No	Yes	Core infrastructure has been modified
aws	gridsst	No	Yes	Core infrastructure has been modified
gcp	2i2c	No	Yes	Core infrastructure has been modified
aws	nasa-cryo	No	Yes	Core infrastructure has been modified
gcp	leap	No	Yes	Core infrastructure has been modified
gcp	catalystproject-latam	No	Yes	Core infrastructure has been modified
aws	victor	No	Yes	Core infrastructure has been modified
gcp	callysto	No	Yes	Core infrastructure has been modified
gcp	meom-ige	No	Yes	Core infrastructure has been modified
gcp	pangeo-hubs	No	Yes	Core infrastructure has been modified
aws	nasa-veda	No	Yes	Core infrastructure has been modified
gcp	cloudbank	No	Yes	Core infrastructure has been modified
aws	nasa-ghg	No	Yes	Core infrastructure has been modified

Production deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
aws	smithsonian	prod	Core infrastructure has been modified
kubeconfig	utoronto	prod	Core infrastructure has been modified
kubeconfig	utoronto	r-prod	Core infrastructure has been modified
aws	openscapes	prod	Core infrastructure has been modified
aws	carbonplan	prod	Core infrastructure has been modified
aws	ubc-eoas	prod	Core infrastructure has been modified
gcp	qcl	prod	Core infrastructure has been modified
gcp	linked-earth	prod	Core infrastructure has been modified
aws	jupyter-meets-the-earth	prod	Core infrastructure has been modified
gcp	awi-ciroh	prod	Core infrastructure has been modified
aws	2i2c-aws-us	researchdelight	Core infrastructure has been modified
aws	2i2c-aws-us	ncar-cisl	Core infrastructure has been modified
aws	2i2c-aws-us	go-bgc	Core infrastructure has been modified
aws	2i2c-aws-us	itcoocean	Core infrastructure has been modified
gcp	2i2c-uk	lis	Core infrastructure has been modified
gcp	m2lines	prod	Core infrastructure has been modified
aws	gridsst	prod	Core infrastructure has been modified
gcp	2i2c	hackanexoplanet	Core infrastructure has been modified
gcp	2i2c	imagebuilding-demo	Core infrastructure has been modified
gcp	2i2c	demo	Core infrastructure has been modified
gcp	2i2c	ohw	Core infrastructure has been modified
gcp	2i2c	pfw	Core infrastructure has been modified
gcp	2i2c	aup	Core infrastructure has been modified
gcp	2i2c	temple	Core infrastructure has been modified
gcp	2i2c	ucmerced	Core infrastructure has been modified
gcp	2i2c	cosmicds	Core infrastructure has been modified
gcp	2i2c	climatematch	Core infrastructure has been modified
gcp	2i2c	neurohackademy	Core infrastructure has been modified
gcp	2i2c	mtu	Core infrastructure has been modified
aws	nasa-cryo	prod	Core infrastructure has been modified
gcp	leap	prod	Core infrastructure has been modified
gcp	catalystproject-latam	unitefa-conicet	Core infrastructure has been modified
aws	victor	prod	Core infrastructure has been modified
gcp	callysto	prod	Core infrastructure has been modified
gcp	meom-ige	prod	Core infrastructure has been modified
gcp	pangeo-hubs	prod	Core infrastructure has been modified
gcp	pangeo-hubs	coessing	Core infrastructure has been modified
aws	nasa-veda	prod	Core infrastructure has been modified
gcp	cloudbank	bcc	Core infrastructure has been modified
gcp	cloudbank	ccsf	Core infrastructure has been modified
gcp	cloudbank	csm	Core infrastructure has been modified
gcp	cloudbank	dvc	Core infrastructure has been modified
gcp	cloudbank	elcamino	Core infrastructure has been modified
gcp	cloudbank	evc	Core infrastructure has been modified
gcp	cloudbank	glendale	Core infrastructure has been modified
gcp	cloudbank	howard	Core infrastructure has been modified
gcp	cloudbank	miracosta	Core infrastructure has been modified
gcp	cloudbank	skyline	Core infrastructure has been modified
gcp	cloudbank	demo	Core infrastructure has been modified
gcp	cloudbank	fresno	Core infrastructure has been modified
gcp	cloudbank	humboldt	Core infrastructure has been modified
gcp	cloudbank	laney	Core infrastructure has been modified
gcp	cloudbank	sbcc	Core infrastructure has been modified
gcp	cloudbank	lacc	Core infrastructure has been modified
gcp	cloudbank	lamission	Core infrastructure has been modified
gcp	cloudbank	mills	Core infrastructure has been modified
gcp	cloudbank	mission	Core infrastructure has been modified
gcp	cloudbank	norco	Core infrastructure has been modified
gcp	cloudbank	palomar	Core infrastructure has been modified
gcp	cloudbank	pasadena	Core infrastructure has been modified
gcp	cloudbank	sjcc	Core infrastructure has been modified
gcp	cloudbank	sacramento	Core infrastructure has been modified
gcp	cloudbank	srjc	Core infrastructure has been modified
gcp	cloudbank	saddleback	Core infrastructure has been modified
gcp	cloudbank	santiago	Core infrastructure has been modified
gcp	cloudbank	sjsu	Core infrastructure has been modified
gcp	cloudbank	tuskegee	Core infrastructure has been modified
gcp	cloudbank	wlac	Core infrastructure has been modified
gcp	cloudbank	csulb	Core infrastructure has been modified
aws	nasa-ghg	prod	Core infrastructure has been modified

GeorgianaElena · 2023-08-25T08:51:07Z

@consideRatio, I still don't see any dask nodes running. Ready to do a terraform apply?

consideRatio · 2023-08-25T08:56:18Z

Wieee yes!!!

GeorgianaElena · 2023-08-25T09:00:43Z

🚀 🚀 🚀

GeorgianaElena · 2023-08-25T09:01:30Z

Feel free to merge whenever you are ready @consideRatio 🚀

consideRatio · 2023-08-25T09:02:49Z

Thank you @GeorgianaElena!!!!!! Massive help!

github-actions · 2023-08-25T09:02:53Z

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/5973999496

gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type

9d76ced

consideRatio requested a review from a team as a code owner August 24, 2023 08:39

github-actions bot assigned consideRatio Aug 24, 2023

consideRatio mentioned this pull request Aug 24, 2023

pangeo-hubs: prometheus is down #3025

Closed

consideRatio added 2 commits August 24, 2023 12:19

terraform, gcp: add billing_project_id variable for use by pangeo-hubs

83f4fd8

Add comments to summarize the confusion

1efa77a

Update code nodes machine type to match whatever was set from the UI

3ed2dcf

GeorgianaElena approved these changes Aug 25, 2023

View reviewed changes

GeorgianaElena self-assigned this Aug 25, 2023

consideRatio merged commit c0d2e5c into 2i2c-org:master Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024

gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024

consideRatio commented Aug 24, 2023 •

edited

Loading

consideRatio commented Aug 24, 2023 •

edited

Loading

GeorgianaElena commented Aug 25, 2023

consideRatio commented Aug 25, 2023

consideRatio commented Aug 25, 2023 •

edited

Loading

GeorgianaElena commented Aug 25, 2023

github-actions bot commented Aug 25, 2023

GeorgianaElena commented Aug 25, 2023

consideRatio commented Aug 25, 2023

GeorgianaElena commented Aug 25, 2023

GeorgianaElena commented Aug 25, 2023

consideRatio commented Aug 25, 2023

github-actions bot commented Aug 25, 2023

gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024

gcp, dask-worker-nodes: pangeo-hubs to use single dask worker node type #3024

Conversation

consideRatio commented Aug 24, 2023 • edited Loading

Action plan

Current activity

consideRatio commented Aug 24, 2023 • edited Loading

GeorgianaElena commented Aug 25, 2023

consideRatio commented Aug 25, 2023

consideRatio commented Aug 25, 2023 • edited Loading

GeorgianaElena commented Aug 25, 2023

github-actions bot commented Aug 25, 2023

Support and Staging deployments

Production deployments

GeorgianaElena commented Aug 25, 2023

consideRatio commented Aug 25, 2023

GeorgianaElena commented Aug 25, 2023

GeorgianaElena commented Aug 25, 2023

consideRatio commented Aug 25, 2023

github-actions bot commented Aug 25, 2023

consideRatio commented Aug 24, 2023 •

edited

Loading

consideRatio commented Aug 24, 2023 •

edited

Loading

consideRatio commented Aug 25, 2023 •

edited

Loading