Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always update in in-place #207

Closed
koboltmarky opened this issue May 20, 2020 · 38 comments
Closed

Always update in in-place #207

koboltmarky opened this issue May 20, 2020 · 38 comments

Comments

@koboltmarky
Copy link

Each terraform apply creates an update in-place.
The following variables are marked as updated:

~ cluster_cidr = "10.42.0.0/16" -> (known after apply)
~ cluster_dns_server = "10.43.0.10" -> (known after apply)
~ cluster_domain = "lalala.local" -> (known after apply)
~ kube_config_yaml = (sensitive value)
~ rke_cluster_yaml = (sensitive value)
~ rke_state = (sensitive value)

Used versions:

Terraform v0.12.24

  • provider.rancher2 v1.8.3
  • provider.rke v1.0.0
@remche
Copy link
Contributor

remche commented May 25, 2020

I dont have the problem with cluster_* but I have with some others :

  • in bastion block
      ~ bastion_host {
             ...
          ~ ssh_agent_auth = true -> false
             ...
        }
  • in cloud_provider block
      ~ cloud_provider {
            name = "openstack"
          ~ openstack_cloud_provider {
              - block_storage {
                  - ignore_volume_az  = false -> null
                  - trust_device_path = false -> null
                }
               ...
              - load_balancer {
                  - create_monitor         = false -> null
                  - manage_security_groups = false -> null
                  - monitor_max_retries    = 0 -> null
                  - use_octavia            = false -> null
                }
              - metadata {}
              - route {}
            }

@matthewmelvin
Copy link

Seeing the same issue with release 1.0.0

Terraform plan shows changes pending.

  # rke_cluster.cluster will be updated in-place
  ~ resource "rke_cluster" "cluster" {
        addon_job_timeout         = 60
...
        client_key                = (sensitive value)
      ~ cluster_cidr              = "10.42.0.0/16" -> (known after apply)
      ~ cluster_dns_server        = "10.43.0.10" -> (known after apply)
      ~ cluster_domain            = "cluster.local" -> (known after apply)
        cluster_name              = "rke-test-cluster"
...
        kube_admin_user           = "kube-admin"
      ~ kube_config_yaml          = (sensitive value)
        kubernetes_version        = "v1.16.9-rancher1-1"
        prefix_path               = "/"
      ~ rke_cluster_yaml          = (sensitive value)
      ~ rke_state                 = (sensitive value)
        running_system_images     = [
...
      ~ network {
            mtu     = 0
          ~ options = {
                "calico_cloud_provider"         = "none"
              - "calico_flex_volume_plugin_dir" = "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds" -> null
            }

Terraform apply runs cleanly, but running plan shows the same changes still pending.

@rawmind0
Copy link
Contributor

@remche , @matthewmelvin could you please provide your RKE tf file??

@matthewmelvin
Copy link

###########################################################################
# rke resources to launch kubernetes once nodes are up

resource rke_cluster "cluster" {
  depends_on = [
    proxmox_vm_qemu.nodes,
    null_resource.metallb_secret,
    null_resource.qemu_nodes
  ]

  cluster_name = var.cluster_name

  ssh_key_path = var.kubernetes_ssh["seckey"]

  prefix_path = "/"

  kubernetes_version = var.k8s_version

  dynamic nodes {
    for_each = var.nodes
    content {
      address = nodes.value.ipaddr
      hostname_override = nodes.value.name
      role = nodes.value.roles
      user = var.kubernetes_ssh["user"]
    }
  }

  services {

    kubelet {
      extra_args = {
        node-status-update-frequency = "4s"
      }
      cluster_domain = "cluster.local"
      cluster_dns_server = "10.43.0.10"
    }
  
    kube_api {
      extra_args = {
        default-not-ready-toleration-seconds = "30"
        default-unreachable-toleration-seconds = "30"
        event-ttl = "24h"
      }
      extra_binds = [
        "/var/lib/rancher/rke/log:/var/lib/rancher/rke/log"
      ]
      service_cluster_ip_range = "10.43.0.0/16"
      service_node_port_range = "30000-32767"
      audit_log {
        enabled = var.enable_audit_policy
        configuration {
          path = "/var/lib/rancher/rke/log/kube-apiserver-audit.log"
          format = "legacy"
          policy = jsonencode(yamldecode(file(var.audit_policy_file)))
        }
      }
    }
  
    kube_controller {
      extra_args = {
        node-monitor-period = var.controller_args["node_monitor_period"]
        node-monitor-grace-period = var.controller_args["node_monitor_grace_period"]
        pod-eviction-timeout = var.controller_args["pod_eviction_timeout"]
      }
      cluster_cidr = "10.42.0.0/16"
      service_cluster_ip_range = "10.43.0.0/16"
    }
  
    etcd {
      extra_args = {
        election-timeout = "5000"
        heartbeat-interval = "500"
      }
      snapshot = true
      retention = "72h"
      creation = "12h"
    }
  }

  network {
    plugin = "calico"
    options = {
      calico_cloud_provider = "none"
    }
  }

  dns {
    provider = "kube-dns"
    upstream_nameservers = []
    reverse_cidrs = []
    node_selector = {}
  }

  ingress {
    provider = "none"
  }

  authentication {
    sans = local.authentication_sans
    strategy = "x509"
  }
  authorization {
    mode = "rbac"
  }

  monitoring {
    provider = "metrics-server"
  }

  addon_job_timeout = var.addon_job_timeout

  addons_include = local.addons_include

  addons = <<EOL
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: kube-system
  name: terraform-id
data:
  id: "${null_resource.qemu_nodes.id}"
---
${local.addons}
EOL

}

@rawmind0
Copy link
Contributor

@matthewmelvin , removing your network.options section should address your issue.

Calico cloud provider should be defined like that, but in this case not needed due to is using default value. Anyway, if you want to define it..

...
  network {
    plugin = "calico"
    calico_network_provider {
      cloud_provider = "none"
    }
  }
...

@matthewmelvin
Copy link

Changing the network block does remove...

      ~ network {
            mtu     = 0
          ~ options = {
                "calico_cloud_provider"         = "none"
              - "calico_flex_volume_plugin_dir" = "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds" -> null
            }

... from the list of changes that are always pending. 👍

But I'm still left with with the same changes as koboltmarky originally reported...

  ~ resource "rke_cluster" "cluster" {
      ~ cluster_cidr              = "10.42.0.0/16" -> (known after apply)
      ~ cluster_dns_server        = "10.43.0.10" -> (known after apply)
      ~ cluster_domain            = "cluster.local" -> (known after apply)
      ~ kube_config_yaml          = (sensitive value)
      ~ rke_cluster_yaml          = (sensitive value)
      ~ rke_state                 = (sensitive value)

... even after multiple applies.

@rawmind0
Copy link
Contributor

@remche, your issue should be addressed with PR #215

@matthewmelvin , are you still having false diff issues?? I've tested with your tf file and not getting diffs anymore.

@koboltmarky, beside these arguments,

~ cluster_cidr = "10.42.0.0/16" -> (known after apply)
~ cluster_dns_server = "10.43.0.10" -> (known after apply)
~ cluster_domain = "lalala.local" -> (known after apply)
~ kube_config_yaml = (sensitive value)
~ rke_cluster_yaml = (sensitive value)
~ rke_state = (sensitive value)

is tf marking to update any other argument?? These arguments are set as newComputed if dns or services arguments have changed, https://github.com/rancher/terraform-provider-rke/blob/master/rke/resource_rke_cluster.go#L48
Could you please porvide your tf file??

@remche
Copy link
Contributor

remche commented Jun 29, 2020

@rawmind0 I could not check because of #216 :(

@rawmind0
Copy link
Contributor

@remche , issue #216 should be fixed.

@koboltmarky
Copy link
Author

koboltmarky commented Jun 30, 2020

@rawmind0 i checked it again no other argumentsa are marked for update

rke.tf

@remche
Copy link
Contributor

remche commented Jun 30, 2020

@rawmind0 I can confirm that #216 is fixed.

But I still get few false update when reapplying:

...
      ~ kube_config_yaml          = (sensitive value)
...
      ~ rke_cluster_yaml          = (sensitive value)
      ~ rke_state                 = (sensitive value)
...

and nothing else to update in the plan.

@rawmind0
Copy link
Contributor

rawmind0 commented Jul 1, 2020

@koboltmarky , @remche that's weird. I've tested using your both configs and working fine to me, no updated once the cluster is created. I'm not able to reproduce your issue now. Are you both using last provider release, v1.0.1?? Have you applied tf plan at least once with the new provider release??

...
Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

rke_cluster.cluster: Creating...
rke_cluster.cluster: Still creating... [10s elapsed]
rke_cluster.cluster: Still creating... [20s elapsed]
rke_cluster.cluster: Still creating... [30s elapsed]
rke_cluster.cluster: Still creating... [40s elapsed]
rke_cluster.cluster: Still creating... [50s elapsed]
rke_cluster.cluster: Still creating... [1m0s elapsed]
rke_cluster.cluster: Still creating... [1m10s elapsed]
rke_cluster.cluster: Still creating... [1m20s elapsed]
rke_cluster.cluster: Still creating... [1m30s elapsed]
rke_cluster.cluster: Still creating... [1m40s elapsed]
rke_cluster.cluster: Still creating... [1m50s elapsed]
rke_cluster.cluster: Still creating... [2m0s elapsed]
rke_cluster.cluster: Still creating... [2m10s elapsed]
rke_cluster.cluster: Still creating... [2m20s elapsed]
rke_cluster.cluster: Creation complete after 2m26s [id=33f78372-bcf2-46ad-a8eb-ea36350d6098]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

# terraform apply
rke_cluster.cluster: Refreshing state... [id=33f78372-bcf2-46ad-a8eb-ea36350d6098]

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

@remche
Copy link
Contributor

remche commented Jul 1, 2020

I was trying with a built from source version. I can confirm I get the same with 1.0.1. Debug log attached.

debug.log

@rawmind0
Copy link
Contributor

rawmind0 commented Jul 1, 2020

I'm creating new cluster with your config, but not getting same result, not being able to reproduce issue. Have you tried creating cluster new cluster with same config?? Are you getting same issue on that case??

@remche
Copy link
Contributor

remche commented Jul 1, 2020

Yes, I recreate cluster from scratch (new vm). I did a new run with both apply stages if that make sense.
debug.log

@koboltmarky
Copy link
Author

koboltmarky commented Jul 1, 2020

I have often destroy and recreate different cluster, same behavior every time. With v1.0.1 it is the same.

@remche
Copy link
Contributor

remche commented Jul 1, 2020

@rawmind0 I discovered that when I disable cloud_provider block it does not trigger update.

@rawmind0
Copy link
Contributor

rawmind0 commented Jul 1, 2020

I have often destroy and recreate different cluster, same behavior every time. With v1.0.1 it is the same.

@koboltmarky , using this file, adapted from yours, rke cluster is deploying fine and no updates.... Could you please take a look to check if something is different on yours??

provider "rke" {
  debug = true
  log_file = "rke_debug.log"
}

resource rke_cluster "cluster" {
  cluster_name = "test"
  nodes {
    address = "x.x.x.x"
    internal_address = "x.x.x.x"
    user    = "ubuntu"
    role    = ["controlplane", "worker", "etcd"]
    hostname_override = "x.x.x.x"
    ssh_key_path = SSH_KEY_PATH
  }
  authentication {
    strategy = "x509"

    sans = [
      "sans.test.local",
    ]
  }

  private_registries {
    url        = "docker.test.local"
    user       = "test"
    password   = "testing"
  }

  services {

    etcd {
      gid = GID
      uid = UID

      extra_args = {
        "data-dir"            = "/var/lib/rancher/etcd/data/"
        "wal-dir"             = "/var/lib/rancher/etcd/wal/wal_dir"
        "election-timeout"    = "5000"
        "heartbeat-interval"  = "500"
        "listen-metrics-urls" = "http://0.0.0.0:2381"
      }

      extra_binds = [
        "/var/lib/etcd/data:/var/lib/rancher/etcd/data",
        "/var/lib/etcd/wal:/var/lib/rancher/etcd/wal",
      ]

    }

    kubelet {

      extra_args = {
        "feature-gates"           = "VolumeSnapshotDataSource=true,CSIDriverRegistry=true,RotateKubeletServerCertificate=true",
        "protect-kernel-defaults" = true,
        "tls-cipher-suites"       = "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
      }
      cluster_domain = "rancher.local"

    }

    kube_controller {

      extra_args = {
        "feature-gates" = "RotateKubeletServerCertificate=true"
      }

    }

    kube_api {
      extra_args = {
        "feature-gates" = "VolumeSnapshotDataSource=true,CSIDriverRegistry=true",
      }

      secrets_encryption_config {
        enabled = true
      }

      event_rate_limit {
        enabled = true
      }

      pod_security_policy = true

      audit_log {
        enabled = true
        configuration {
          max_age    = 5
          max_backup = 5
          max_size   = 100
          path       = "-"
          format     = "json"
        }
      }
    }
  }
  network {
    plugin = "canal"
  }

  addons = <<EOL
---
apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: default-psp-role
  namespace: ingress-nginx
rules:
- apiGroups:
  - extensions
  resourceNames:
  - default-psp
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: default-psp-rolebinding
  namespace: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
---
apiVersion: v1
kind: Namespace
metadata:
  name: cattle-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: default-psp-role
  namespace: cattle-system
rules:
- apiGroups:
  - extensions
  resourceNames:
  - default-psp
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: default-psp-rolebinding
  namespace: cattle-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
---
apiVersion: v1
kind: Namespace
metadata:
  name: cert-manager
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: default-psp-role
  namespace: cert-manager
rules:
- apiGroups:
  - extensions
  resourceNames:
  - default-psp
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: default-psp-rolebinding
  namespace: cert-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  requiredDropCapabilities:
  - NET_RAW
  privileged: false
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  fsGroup:
    rule: RunAsAny
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - emptyDir
  - secret
  - persistentVolumeClaim
  - downwardAPI
  - configMap
  - projected
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: psp:restricted
rules:
- apiGroups:
  - extensions
  resourceNames:
  - restricted
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: psp:restricted
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: psp:restricted
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
EOL

}

@rawmind0
Copy link
Contributor

rawmind0 commented Jul 1, 2020

@rawmind0 I discovered that when I disable cloud_provider block it does not trigger update.

@remche testing yours with cloud_provider openstack config..

@koboltmarky
Copy link
Author

@rawmind0

some small differences:

  • three nodes instead of one node
  • I don't use internal address for the nodes
  • two sans addresses
  • ssh_key_path is global, outside the node object
  • I use the is_default variable of the private_registries object
  • i use a policy file for audit_log

@rawmind0
Copy link
Contributor

rawmind0 commented Jul 2, 2020

@koboltmarky, about differences

  • three nodes instead of one node
  • I don't use internal address for the nodes

i don't think number of the nodes or use internal address makes any difference

  • two sans addresses

Updated and tested

  • ssh_key_path is global, outside the node object

Updated and tested

  • I use the is_default variable of the private_registries object

Updated and tested

  • i use a policy file for audit_log

Updated and tested

RKE cluster is still deployed fine and getting not diff on next terraform apply . I'm not able to reproduce your issue. Could you please test with tf i provided on #207 (comment)?? Are you using tf modules or just using tf provider??

@rawmind0
Copy link
Contributor

rawmind0 commented Jul 2, 2020

@remche, could you please provide your cloud_provider block set on tf file and cloud_provider block saved on tfstate (once created)??

@remche
Copy link
Contributor

remche commented Jul 2, 2020

tf block is here : https://github.com/remche/terraform-openstack-rke/blob/5b4dfd8075171e4d589267f17cb5337b48e7165e/modules/rke/main.tf#L114-L128

resulting tfstate block :

...
        "cloud_provider": [
          {
            "aws_cloud_config": [],
            "aws_cloud_provider": [],
            "azure_cloud_config": [],
            "azure_cloud_provider": [],
            "custom_cloud_config": "",
            "custom_cloud_provider": "",
            "name": "openstack",
            "openstack_cloud_config": [],
            "openstack_cloud_provider": [
              {
                "block_storage": [
                  {
                    "bs_version": "",
                    "ignore_volume_az": false,
                    "trust_device_path": false
                  }
                ],
                "global": [
                  {
                    "auth_url": "https://url.domain.tld:5000/v3",
                    "ca_file": "",
                    "domain_id": "default",
                    "domain_name": "",
                    "password": "xxxxxx",
                    "region": "",
                    "tenant_id": "xxxxxxxxxxxxxxxxxxxxxxxxx",
                    "tenant_name": "",
                    "trust_id": "",
                    "user_id": "",
                    "username": "xxxxxx"
                  }
                ],
                "load_balancer": [
                  {
                    "create_monitor": false,
                    "floating_network_id": "",
                    "lb_method": "",
                    "lb_provider": "",
                    "lb_version": "",
                    "manage_security_groups": false,
                    "monitor_delay": "",
                    "monitor_max_retries": 0,
                    "monitor_timeout": "",
                    "subnet_id": "",
                    "use_octavia": false
                  }
                ],
                "metadata": [
                  {
                    "request_timeout": null,
                    "search_order": null
                  }
                ],
                "route": [
                  {
                    "router_id": null
                  }
                ]
              }
            ],
            "vsphere_cloud_config": [],
            "vsphere_cloud_provider": []
          }
        ],

that's weird that the cloud_provider block itself is not set for update., but if I discard it (-var="cloud_provider=false") no update is triggered...

@koboltmarky
Copy link
Author

@rawmind0 i'm using a tf module

module "rancher-admin-cluster" {
  source = "./modules/rancher-admin-cluster"

  providers = {
    rancher2 = rancher2.bootstrap
  }

  kubernetes_version       = "v1.18.3-rancher2-2"
  node_username            = "rancher"
  ssh_key_file_name        = "~/.ssh/id_rsa"
  rancher_version          = "v2.4.5"
  rancher_admin_url        = "xxxxxx"
  rancher_admin_ip         = "xxxxx"
  docker_registry_url      = "xxxxxxx"
  docker_registry_user     = var.docker_registry_user
  docker_registry_password = var.docker_registry_password
  cluster_domain           = "xxxxxx"
  cert_manager_version     = "0.15.0"
  rancher_admin_password   = var.rancher_admin_password
}

@matthewmelvin
Copy link

@matthewmelvin , are you still having false diff issues?? I've tested with your tf file and not getting diffs anymore.

Unfortunately I am still seeing the same persistent diff with the latest version.

terraform@974f669cbe4b:~/templates/clu05$ terraform version 
Terraform v0.12.28
+ provider.local v1.4.0
+ provider.null v2.1.2
+ provider.proxmox (unversioned)
+ provider.rke v1.0.1
terraform@974f669cbe4b:~/templates/clu05$ terraform plan -no-color | grep -E '^([[:space:]]*[#~+]|-/)' 
  ~ update in-place
  # rke_cluster.cluster will be updated in-place
  ~ resource "rke_cluster" "cluster" {
      ~ cluster_cidr              = "10.42.0.0/16" -> (known after apply)
      ~ cluster_dns_server        = "10.43.0.10" -> (known after apply)
      ~ cluster_domain            = "cluster.local" -> (known after apply)
      ~ kube_config_yaml          = (sensitive value)
      ~ rke_cluster_yaml          = (sensitive value)
      ~ rke_state                 = (sensitive value)
terraform@974f669cbe4b:~/templates/clu05$

issue-207-test-1594177150.txt
issue-207-rke-resource-1594177150.txt
issue-207-full-plan-1594177150.txt

@dje4om
Copy link
Contributor

dje4om commented Aug 7, 2020

Same behaviour in my case, here some informations if it can help:

  • fresh imported rke config/state from existing cluster (rke 1.1.3/1.18.3)
  • using cluster_yaml argument to existing yaml
  • everything is fine, but exactly the same config elements still in "update in-place" state

i was able to remove perpetual update on the three cluster_* params by adding this config:

  services {
    kube_controller {
      cluster_cidr = "10.42.0.0/16"
    }
  }

but still have :

~ kube_config_yaml  = (sensitive value)
~ rke_cluster_yaml  = (sensitive value)
~ rke_state         = (sensitive value)

Interesting fact In this first case, when i add this configuration, no more update-in place occurs ! (i'm trying to move from cluster_yaml file params to regular params and this params match my current configuration):

  network {
    plugin = "calico"
  }

Second case, another fresh cluster deployed from terraform and regular arguments (not cluster_yaml) did not have this behaviour when no change are expected, but if i had a new config, the same configurations items are listed for update

Added configuration (No change detected on previous plan) :

  services {
    kube_api {
      secrets_encryption_config {
        enabled = true
      }
    }
  }

Plan:

  ~ update in-place
  # rke_cluster.kube_admin will be updated in-place
  ~ resource "rke_cluster" "kube_admin" {
      ~ cluster_cidr              = "10.42.0.0/16" -> (known after apply)
      ~ cluster_dns_server        = "10.43.0.10" -> (known after apply)
      ~ cluster_domain            = "cluster.local" -> (known after apply)
      ~ kube_config_yaml          = (sensitive value)
      ~ rke_cluster_yaml          = (sensitive value)
      ~ rke_state                 = (sensitive value)
      ~ services {
          ~ kube_api {
              + secrets_encryption_config {
                  + enabled = true

Both clusters are using :

  • Terraform 0.12.29
  • RKE Provider 1.0.1
  • Kubernetes 1.18.3

@rawmind0
Copy link
Contributor

Updated rke_cluster diff function on PR #237 . Could you please check if address your issues??

@remche
Copy link
Contributor

remche commented Aug 17, 2020

I still get few false update when reapplying 😢

...
      ~ kube_config_yaml          = (sensitive value)
...
      ~ rke_cluster_yaml          = (sensitive value)
      ~ rke_state                 = (sensitive value)
...

@rawmind0
Copy link
Contributor

@remche , unfortunately i don't have any openstack installation to test, but i've added some provider debug info on PR #239

Configuring provider with debug (not terraform debug) and log file like

provider "rke" {
  debug = true
  log_file = "rke_debug.log"
}

you'll get debug messages on log_file like

...
time="2020-08-19T13:02:50Z" level=info msg="rke_cluster changed arguments: map[<argument_name>:true]"
time="2020-08-19T13:02:50Z" level=debug msg="<argument_name> old: 60 new: 120"
...

Could you please test it and take a look on debug messages saying what is changing on your apply??

@rawmind0
Copy link
Contributor

The main issue is caused by hashicorp/terraform-plugin-sdk#98 . The workaround is rechecking arguments on CustomizeDiff resource function https://github.com/rancher/terraform-provider-rke/pull/239/files#diff-cb166cd4de81ceb9b4ff73ab04260cbeR484

@remche
Copy link
Contributor

remche commented Aug 19, 2020

time="2020-08-19T17:44:54+02:00" level=info msg="rke_cluster changed arguments: map[cloud_provider:true nodes:true]"

But note that with #239 I get lot of things marked as update-in-place :

resource "rke_cluster" "cluster" {
65:        addons                    = <<~EOT
335:      ~ api_server_url            = "https://xxx.xxx.xxx.xxx:6443" -> (known after apply)
344:      ~ control_plane_hosts       = [
353:      ~ etcd_hosts                = [
361:      ~ inactive_hosts            = [] -> (known after apply)
364:      ~ kube_config_yaml          = (sensitive value)
367:      ~ rke_cluster_yaml          = (sensitive value)
368:      ~ rke_state                 = (sensitive value)
405:        ssh_key_path              = "~/.ssh/id_rsa"
407:      ~ worker_hosts              = [
440:            ssh_key_path   = "~/.ssh/id_rsa"
502:                "canal_flex_volume_plugin_dir" = "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds"
507:      ~ nodes {
522:      ~ nodes {
535:      ~ nodes {
548:      ~ nodes {
561:      ~ nodes {

Thx for the hard work on this !

@rawmind0
Copy link
Contributor

@remche , could you please also provide debug lines with the argument changes?? The most interesting is the change on cloud_provider argument.

...
time="2020-08-19T13:02:50Z" level=debug msg="cloud_provider old: <value> new: <value>"
time="2020-08-19T13:02:50Z" level=debug msg="nodes old: <value> new: <value>"
...

@rawmind0
Copy link
Contributor

On nodes, what is trying to change?? is trying to shift the nodes list order??

@remche
Copy link
Contributor

remche commented Aug 19, 2020

Here is the sanitized output. I'm not sure how to interpret it. Nodes list order seems the same.
rke_debug-clean.log

@rawmind0
Copy link
Contributor

Thanks @remche .. I think i found the cause of your issue taking a look to your logs

Thecloud_provider diff is on openstack load_balancer computed argument that has default values

old - monitor_delay: monitor_max_retries:0 monitor_timeout:
new - monitor_delay:60s monitor_max_retries:5 monitor_timeout:30s 

I've updated PR #239 fixing cloud_provider diff. Could you please rebase and test it??

The nodes diff is on ssh_agent_auth computed field. Did the plan show any other diff on nodes?? Anyway, on first terraform apply, some nodes diffs are expected due to all nodes arguments has been updated to computed = false. Diff should be empty on next terraform apply.

@remche
Copy link
Contributor

remche commented Aug 20, 2020

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.

🥳

@rawmind0
Copy link
Contributor

Cool @remche , thanks for the debug info.

PR #239 merged on master. Closing the issue, if someone else is having false diffs, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants