Cannot remove topology blocks once added #445

shashwat-sec · 2022-02-24T04:44:34Z

Readiness Checklist

I am running the latest version
I checked the documentation and found no answer
I checked to make sure that this issue has not already been filed
I am reporting the issue to the correct repository (for multi-repository projects)

Current Behavior

elasticsearch {
    dynamic "topology" {
      for_each = [for i in var.dr ? ["hot_content"] : ["coordinating", "hot_content", "master"]: {
      topology = i
      }]
      content {
        id                = topology.value.topology
        zone_count        = var.zone_count
        size              = topology.value.topology == "hot_content" ? var.data_size : topology.value.topology == "coordinating" ? var.coordinating_size : topology.value.topology == "master" ? var.master_size : null
      }
    }
  }

I am trying to create a scaled down elastic cluster with just hot_content in single zone when var.dr is set to true, else it should create a full multi-zone cluster.
Creating the scaled down version is working fine. Scaling up by changing var.dr to false is also working fine.
But when i try to scale down the cluster again by setting var.dr to true, It tries to delete the hot_content block and modify coordinating block to hot_content.

Plan output:

~ elasticsearch {
            # (7 unchanged attributes hidden)

          ~ topology {
              ~ id                        = "coordinating" -> "hot_content"
                # (6 unchanged attributes hidden)
            }
          - topology {
              - config                    = [] -> null
              - id                        = "hot_content" -> null
              - instance_configuration_id = "gcp.data.highio.1" -> null
              - node_roles                = [
                  - "data_content",
                  - "data_hot",
                  - "remote_cluster_client",
                  - "transform",
                ] -> null
              - size                      = "1g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 1 -> null

              - autoscaling {
                  - max_size          = "128g" -> null
                  - max_size_resource = "memory" -> null
                }
            }
          - topology {
              - config                    = [] -> null
              - id                        = "master" -> null
              - instance_configuration_id = "gcp.master.1" -> null
              - node_roles                = [
                  - "master",
                  - "remote_cluster_client",
                ] -> null
              - size                      = "4g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 1 -> null
            }

            # (1 unchanged block hidden)
        }

Applying this plan errors out:

Error: failed updating deployment: 2 errors occurred:
│       * api error: clusters.cluster_invalid_plan: Cluster must contain at least a master topology element and a data topology element. 'master' node type is missing,'data' node type is missing,'master' node type exists in more than one topology element (resources.elasticsearch[0].cluster_topology)
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The data roles in the plan must be the same as the data roles in the template [id = hot_content] (resources.elasticsearch[0])

I also tested by keeping the coordinating block as is and just try to delete master block.
Plan output shows fine:

 ~ elasticsearch {
            # (7 unchanged attributes hidden)

          - topology {
              - config                    = [] -> null
              - id                        = "master" -> null
              - instance_configuration_id = "gcp.master.1" -> null
              - node_roles                = [
                  - "master",
                  - "remote_cluster_client",
                ] -> null
              - size                      = "4g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 1 -> null
            }

            # (3 unchanged blocks hidden)
        }

But applying this plan errors out here:

Error: failed updating deployment: 1 error occurred:
│       * api error: clusters.cluster_invalid_plan: Cluster must contain at least a master topology element and a data topology element. 'master' node type is missing,'master' node type exists in more than one topology element (resources.elasticsearch[0].cluster_topology)

Expected Behavior

Ideally it should create/delete blocks as we add/remove those topology blocks from elasticsearch. This scaling up and down works fine using console, so it should follow similar behaviour here.

## Terraform definition

Steps to Reproduce

Terraform code and steps added above.

Possible Solution

Your Environment

Version used: 0.3.0
Running against Elastic Cloud SaaS or Elastic Cloud Enterprise and version: Elastic Cloud SaaS
Environment name and version (e.g. Go 1.9):
Server type and version:
Operating System and version:
Link to your project:

The text was updated successfully, but these errors were encountered:

tobio · 2022-02-27T23:14:17Z

This is a Terraform issue, and not specific to this provider. The order these resources are declared in the rendered Terraform definition files is important.

You can likely update your state definition to include the hot_content tier first, and the 'optional' elements second, i.e:

for_each = [for i in var.dr ? ["hot_content"] : ["coordinating", "hot_content", "master"]: {

becomes

for_each = [for i in var.dr ? ["hot_content"] : ["hot_content", "coordinating", "master"]: {

shashwat-sec · 2022-03-01T06:57:03Z

@tobio I tried that as well. But somehow, it is expecting coordinating to be at the top.

tobio · 2022-03-16T04:14:08Z

@shashwat-sec sorry about the bad suggestion there. Digging into the code it looks like this is deeply tied into how these resources are managed.

We can look at fixing this behaviour, however that will take some time. An option which works right now would be to include all the expected topology elements, but set the size to 0 for the elements you don't want present in the deployment. Something like:

elasticsearch {
  topology {
    id = "hot_content"
    zone_count = var.zone_count
    size = var.data_size
  }

  topology {
    id = "coordinating"
    zone_count = var.zone_count
    size = var.dr ? 0 : var.coordinating_size
  }

  topology {
    id = "master"
    zone_count = var.zone_count
    size = var.dr ? 0 : var.master_size
  }
}

IanMoroney · 2022-06-20T16:08:08Z

@tobio , I'm actually experiencing the same issue, but it appears to be worse than described.
In addition to the plan changing depending on whether you declare these topologies and set them to 0 or not, I also have two identical clusters described in the same way, and the topology order for each of them is different.

The staging plan wants to set the first declared topology from hot_content to cold, and the prod deployment wants to set the first topology back to hot_content, so I can't even describe the terraform resource in the same way for my staging and prod clusters.

I can only say that this urgently needs fixing so that the ordering isn't an issue, or expect the same order every time, and to make it consistent.

tobio · 2022-06-21T00:42:23Z

@IanMoroney can you include your resource/module definition and the value of any vars defined. Is autoscale=true in one of the prod/staging deployments?

this urgently needs fixing so that the ordering isn't an issue, or expect the same order every time, and to make it consistent.

Agreed, this behaviour is very frustrating. There's some ongoing investigation around solving this problem, but unfortunately there's no quick win so that ordering isn't an issue. The provider should already expect the same order every time, send me through the definition and we can understand what's going on and go from there.

IanMoroney · 2022-06-21T09:31:09Z

It is possible that you may not be able to replicate my exact scenario, as my two deployments of ES have wildly different ES versions.

main.tf


resource "ec_deployment" "search" {

  name = "${var.environment}-search"

  region                 = "azure-northeurope"
  version                = var.elasticsearch_version
  deployment_template_id = "azure-io-optimized"

  elasticsearch {
    topology {
      id            = "hot_content"
      size          = var.elasticsearch_size
      size_resource = "memory"
      zone_count    = 2
    }
  }

  kibana {}

}

variable "environment" {
  type        = string
  description = "The environment where resources are being provisioned. Mainly used as a name prefix."
}


variable "elasticsearch_version" {
  type        = string
  description = "The version of elasticsearch to provision the cluster."
}
variable "elasticsearch_size" {
  type        = string
  description = "The size of elasticsearch deployment."
}

In the staging environment (on the EC deployment itself), autoscaling is enabled. It is not currently defined in the terraform, and maybe that's contributing towards the confusion.
Prod doesn't have autoscaling enabled, which is

staging.tfvars

environment                     = "staging"
elasticsearch_version           = "7.16.1"
elasticsearch_size              = "1g"

prod.tfvars

environment                     = "prod"
elasticsearch_version           = "7.9.3"
elasticsearch_size              = "29g"

staging plan file:

Terraform will perform the following actions:

  # ec_deployment.b2c_search[0] will be updated in-place
  ~ resource "ec_deployment" "b2c_search" {
        id                     = "33f9f08f2a235c1cc51c6468dd549a7a"
        name                   = "staging-b2c-search"
        tags                   = {}
        # (6 unchanged attributes hidden)

      ~ elasticsearch {
            # (7 unchanged attributes hidden)

          ~ topology {
              ~ id                        = "cold" -> "hot_content"
              ~ size                      = "0g" -> "2g"
              ~ zone_count                = 1 -> 2
                # (4 unchanged attributes hidden)

                # (1 unchanged block hidden)
            }
          - topology {
              - config                    = [] -> null
              - id                        = "frozen" -> null
              - instance_configuration_id = "azure.es.datafrozen.lsv2" -> null
              - node_roles                = [
                  - "data_frozen",
                ] -> null
              - size                      = "0g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 1 -> null

              - autoscaling {
                  - max_size          = "120g" -> null
                  - max_size_resource = "memory" -> null
                }
            }
          - topology {
              - config                    = [] -> null
              - id                        = "hot_content" -> null
              - instance_configuration_id = "azure.data.highio.l32sv2" -> null
              - node_roles                = [
                  - "data_content",
                  - "data_hot",
                  - "ingest",
                  - "master",
                  - "remote_cluster_client",
                  - "transform",
                ] -> null
              - size                      = "2g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 2 -> null

              - autoscaling {
                  - max_size             = "29g" -> null
                  - max_size_resource    = "memory" -> null
                  - policy_override_json = jsonencode(
                        {
                          - proactive_storage = {
                              - forecast_window = "30 m"
                            }
                        }
                    ) -> null
                }
            }
          - topology {
              - config                    = [] -> null
              - id                        = "ml" -> null
              - instance_configuration_id = "azure.ml.d64sv3" -> null
              - node_roles                = [
                  - "ml",
                  - "remote_cluster_client",
                ] -> null
              - size                      = "0g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 1 -> null

              - autoscaling {
                  - max_size          = "60g" -> null
                  - max_size_resource = "memory" -> null
                  - min_size          = "0g" -> null
                  - min_size_resource = "memory" -> null
                }
            }
          - topology {
              - config                    = [] -> null
              - id                        = "warm" -> null
              - instance_configuration_id = "azure.data.highstorage.e16sv3" -> null
              - node_roles                = [
                  - "data_warm",
                  - "remote_cluster_client",
                ] -> null
              - size                      = "0g" -> null
              - size_resource             = "memory" -> null
              - zone_count                = 2 -> null

              - autoscaling {
                  - max_size          = "116g" -> null
                  - max_size_resource = "memory" -> null
                }
            }

            # (1 unchanged block hidden)
        }


        # (2 unchanged blocks hidden)
    }

prod plan file:

no changes need to be made

IanMoroney · 2022-06-21T09:32:59Z

So it does appear that autoscaling is the culprit here for me.
When autoscaling is enabled, it adds these phantom topologies which don't actually exist yet. If autoscaling is disabled, they all go away and the plan is happy.

The ordering issue is still apparent though when autoscaling is enabled, so I wonder if autoscaling forces a topology order?

tobio · 2022-06-22T07:54:01Z

@IanMoroney that's correct. When autoscaling is enabled, all topology elements which may be autoscaled into existence must be defined. We've recently merged a docs change with an updated example for exactly this scenario. Reading over those docs again, we should detail exactly which elements have a non-zero max_size by default (that's cold, frozen, hot_content, ml, and warm).

Lmk if you've got any feedback on those docs as well, there's always room for improvement.

dimuon · 2023-03-01T15:03:31Z

Closed by #567

shashwat-sec added bug Something isn't working Team:Delivery labels Feb 24, 2022

Kushmaro removed the Team:Delivery label Feb 24, 2022

Kushmaro added the theme:topology label Aug 9, 2022

dimuon mentioned this issue Dec 7, 2022

Feature/530/migrate to plugin framework #567

Merged

10 tasks

dimuon closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot remove topology blocks once added #445

Cannot remove topology blocks once added #445

shashwat-sec commented Feb 24, 2022

tobio commented Feb 27, 2022 •

edited

Loading

shashwat-sec commented Mar 1, 2022

tobio commented Mar 16, 2022

IanMoroney commented Jun 20, 2022

tobio commented Jun 21, 2022

IanMoroney commented Jun 21, 2022

IanMoroney commented Jun 21, 2022

tobio commented Jun 22, 2022

dimuon commented Mar 1, 2023

Cannot remove topology blocks once added #445

Cannot remove topology blocks once added #445

Comments

shashwat-sec commented Feb 24, 2022

Readiness Checklist

Current Behavior

Expected Behavior

Steps to Reproduce

Possible Solution

Your Environment

tobio commented Feb 27, 2022 • edited Loading

shashwat-sec commented Mar 1, 2022

tobio commented Mar 16, 2022

IanMoroney commented Jun 20, 2022

tobio commented Jun 21, 2022

IanMoroney commented Jun 21, 2022

IanMoroney commented Jun 21, 2022

tobio commented Jun 22, 2022

dimuon commented Mar 1, 2023

tobio commented Feb 27, 2022 •

edited

Loading