Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiple version in instance group manager #1499

Merged
merged 7 commits into from
Jun 4, 2018

Conversation

migibert
Copy link
Contributor

Hi there,

Here is an attempt to implement canary releases ( #1252 ). This is the first time I write golang and make a terraform contribution, I opened the PR to obtain feedback and advices so please let me know how I can improve this code!

In addition I used make fmt to format the code but left some lines bigger than 80 characters, do I need to split them ?

I tested the feature against a project with the following configuration:

resource "google_compute_health_check" "mikael-hackathon-healthcheck" {
  name                = "mikael-hackathon-healthcheck"
  check_interval_sec  = 1
  timeout_sec         = 1
  healthy_threshold   = 2
  unhealthy_threshold = 10

  http_health_check {
    request_path = "/"
    port         = "80"
  }
}

resource "google_compute_instance_template" "mikael-hackaton-template" {
  name_prefix = "mikael-hackaton-"
  description = "This template is used to create app server instances."

  tags = ["loadbalanced", "internal-web", "hackaton"]

  labels = {
    environment = "hackaton"
  }

  instance_description = "Hackaton demo rolling upgrade"
  machine_type         = "n1-standard-1"
  can_ip_forward       = false

  scheduling {
    automatic_restart   = true
    on_host_maintenance = "MIGRATE"
  }

  disk {
    source_image = "debian-cloud/debian-9"
    disk_type    = "pd-standard"
    disk_size_gb = 20
    auto_delete  = true
    boot         = true
  }

  network_interface {
    network       = "default"
    access_config = {}
  }

  service_account {
    email  = "${google_service_account.mikael-hackaton.email}"
    scopes = ["cloud-platform"]
  }

  lifecycle {
    create_before_destroy = true
  }

  metadata_startup_script = "apt-get update && apt-get install -y apache2 && echo I am stable version at $(hostname) > /var/www/html/index.html"
}

resource "google_compute_instance_template" "mikael-hackaton-template-canary" {
  name_prefix = "mikael-hackaton-canary"
  description = "This template is used to create app server instances."

  tags = ["loadbalanced", "internal-web", "hackaton"]

  labels = {
    environment = "hackaton"
  }

  instance_description = "Hackaton demo rolling upgrade"
  machine_type         = "n1-standard-1"
  can_ip_forward       = false

  scheduling {
    automatic_restart   = true
    on_host_maintenance = "MIGRATE"
  }

  disk {
    source_image = "debian-cloud/debian-9"
    disk_type    = "pd-standard"
    disk_size_gb = 20
    auto_delete  = true
    boot         = true
  }

  network_interface {
    network       = "default"
    access_config = {}
  }

  service_account {
    email  = "${google_service_account.mikael-hackaton.email}"
    scopes = ["cloud-platform"]
  }

  lifecycle {
    create_before_destroy = true
  }

  metadata_startup_script = "apt-get update && apt-get install -y apache2 && echo I am a canary at $(hostname) > /var/www/html/index.html"
}

resource "google_compute_target_pool" "mikael-hackaton-target-pool" {
  name = "mikael-hackaton-target-pool"
}

resource "google_compute_instance_group_manager" "mikael-hackaton-manager" {
  name = "mikael-hackaton-manager"
  base_instance_name = "mikael-hackaton"
  #instance_template = "${google_compute_instance_template.mikael-hackaton-template.self_link}"
  update_strategy   = "ROLLING_UPDATE"
  zone              = "${var.zone}"
  target_pools = ["${google_compute_target_pool.mikael-hackaton-target-pool.self_link}"]
  target_size  = 5

  version {
    name = "primary"
    instance_template = "${google_compute_instance_template.mikael-hackaton-template.self_link}"
  }

  version {
    name = "canary"
    instance_template = "${google_compute_instance_template.mikael-hackaton-template-canary.self_link}"
    target_size_fixed = 1
  }

  named_port {
    name = "http"
    port = 80
  }

  auto_healing_policies {
    health_check      = "${google_compute_health_check.mikael-hackathon-healthcheck.self_link}"
    initial_delay_sec = 10
  }

  rolling_update_policy {
    type                    = "PROACTIVE"
    minimal_action          = "REPLACE"
    max_surge_percent       = 100
    max_unavailable_percent = 50
    min_ready_sec           = 5
  }
}

@danawillow danawillow self-requested a review May 24, 2018 23:28
Copy link
Contributor

@danawillow danawillow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @migibert, this is a great first contribution, thanks so much! I have a few comments, but it looks really good overall.

To answer your question about line lengths: we don't set any sort of limit so feel free to make your lines >80 characters.

DiffSuppressFunc: compareSelfLinkRelativePaths,
},

"target_size_fixed": &schema.Schema{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. to make this easier to codegen later, mind matching the API here? (by having another nested object target_size with a fixed field and a percent field)
  2. You can make sure that setting fixed and percent at the same time fail during the plan phase by setting ConflictsWith on each of them- take a look around the repo for examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to set ConflictsWith because it is a property of an element inside a list.
I saw a sample when the list is at the root level:
ConflictsWith: []string{"rolling_update_policy.0.max_surge_percent"}

But I don't see how I can make it work for target_size (which is a list of 1 element) inside version (which is a list too).
I tried (naively):

  • ConflictsWith: []string{"percent"} and ConflictsWith: []string{"fixed"}
  • ConflictsWith: []string{"target_size.0.percent"} and ConflictsWith: []string{"target_size.0.fixed"}

My feeling is that I need to find the right indice to edit the correct version with something like that:
ConflictsWith: []string{"version.<X>.target_size.0.percent"}. Is that what I need to do ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right- that isn't possible at the moment.

versionMap["name"] = version.Name
versionMap["instance_template"] = ConvertSelfLinkToV1(version.InstanceTemplate)
if value := version.TargetSize.Percent; value > 0 {
versionMap["target_size_percent"] = version.TargetSize.Percent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to make this versionMap["target_size_percent"] = value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, my bad!

return err
}

if d.Get("update_strategy").(string) == "RESTART" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code as in the instance_template update section, right? Can it be factored out into a separate function that both of them call?

Also, can you add a comment or something that makes it clear that yes, this does need to be called? From https://cloud.google.com/compute/docs/instance-groups/updating-managed-instance-groups#starting_a_canary_update it looks like all that needs to be done is that first Patch call updating the version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made the test again, I am not sure why it is not working, I see in Cloud Console that update parameters are well updated, I see the correct target number of instances in versions, but it looks like the update part is not triggered...
Can someone help me to understand what is happening ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template Current size Target size
mikael-hackaton-20180528155225477300000001 4 out of 5 (80%) 0 instances
mikael-hackaton-20180528155755327200000001 0 out of 5 (0%) 4 instances
mikael-hackaton-canary20180528155225478600000002 1 out of 5 (20%) 1 instance (fixed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test isn't working? At this commit, all the TestAccRegionInstanceGroupManager tests seem to be passing for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are not failing but I wasn't really satisfied with all this code to perform an update.

However, I just realized that API updates were treated as opportunistic updates by default (and I expected a proactive update). That's why this code portion is needed. I just refactored it and commented to make it clear !

DiffSuppressFunc: compareSelfLinkRelativePaths,
},

"version": &schema.Schema{
Type: schema.TypeList,
Optional: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and instance_template within it are also going to need to be Computed. When I run tests that don't set a version, it seems to want to fill one in:

=== RUN   TestAccInstanceGroupManager_autoHealingPolicies
--- FAIL: TestAccInstanceGroupManager_autoHealingPolicies (245.15s)
    testing.go:513: Step 0 error: After applying this step, the plan was not empty:
        
        DIFF:
        
        UPDATE: google_compute_instance_group_manager.igm-basic
          version.#:                   "1" => "0"
          version.0.instance_template: "https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/global/instanceTemplates/igm-test-bhvjqh6a61" => ""
        
        STATE:
        
        google_compute_http_health_check.zero:
          ID = igm-test-71h0sxbiwk
          provider = provider.google
          check_interval_sec = 1
          creation_timestamp = 2018-05-24T16:45:48.830-07:00
          description = 
          healthy_threshold = 2
          host = 
          name = igm-test-71h0sxbiwk
          port = 80
          project = hc-terraform-testing
          request_path = /
          self_link = https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/global/httpHealthChecks/igm-test-71h0sxbiwk
          timeout_sec = 1
          unhealthy_threshold = 2
        google_compute_instance_group_manager.igm-basic:
          ID = igm-test-hu2dq1417r
          provider = provider.google
          auto_healing_policies.# = 1
          auto_healing_policies.0.health_check = https://www.googleapis.com/compute/beta/projects/hc-terraform-testing/global/httpHealthChecks/igm-test-71h0sxbiwk
          auto_healing_policies.0.initial_delay_sec = 10
          base_instance_name = igm-basic
          description = Terraform test instance group manager
          fingerprint = vF4q-P5d_aI=
          instance_group = https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/zones/us-central1-c/instanceGroups/igm-test-hu2dq1417r
          instance_template = https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/global/instanceTemplates/igm-test-bhvjqh6a61
          name = igm-test-hu2dq1417r
          named_port.# = 0
          project = hc-terraform-testing
          self_link = https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/zones/us-central1-c/instanceGroupManagers/igm-test-hu2dq1417r
          target_pools.# = 1
          target_pools.1124437731 = https://www.googleapis.com/compute/beta/projects/hc-terraform-testing/regions/us-central1/targetPools/igm-test-u3zeiq2g3x
          target_size = 2
          update_strategy = RESTART
          version.# = 1
          version.0.instance_template = https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/global/instanceTemplates/igm-test-bhvjqh6a61
          version.0.name = 
          version.0.target_size_fixed = 0
          version.0.target_size_percent = 0
          wait_for_instances = false
          zone = us-central1-c
        
          Dependencies:
            google_compute_http_health_check.zero
            google_compute_instance_template.igm-basic
            google_compute_target_pool.igm-basic
        google_compute_instance_template.igm-basic:
          ID = igm-test-bhvjqh6a61
          provider = provider.google
          can_ip_forward = false
          description = 
          disk.# = 1
          disk.0.auto_delete = true
          disk.0.boot = true
          disk.0.device_name = persistent-disk-0
          disk.0.disk_name = 
          disk.0.disk_size_gb = 0
          disk.0.disk_type = pd-standard
          disk.0.interface = SCSI
          disk.0.mode = READ_WRITE
          disk.0.source = 
          disk.0.source_image = debian-cloud/debian-8-jessie-v20160803
          disk.0.type = PERSISTENT
          instance_description = 
          machine_type = n1-standard-1
          metadata.% = 1
          metadata.foo = bar
          metadata_fingerprint = EEf91SmZULE=
          min_cpu_platform = 
          name = igm-test-bhvjqh6a61
          network_interface.# = 1
          network_interface.0.access_config.# = 0
          network_interface.0.address = 
          network_interface.0.alias_ip_range.# = 0
          network_interface.0.network = https://www.googleapis.com/compute/beta/projects/hc-terraform-testing/global/networks/default
          network_interface.0.network_ip = 
          network_interface.0.subnetwork = 
          network_interface.0.subnetwork_project = 
          project = hc-terraform-testing
          scheduling.# = 1
          scheduling.0.automatic_restart = true
          scheduling.0.on_host_maintenance = MIGRATE
          scheduling.0.preemptible = false
          self_link = https://www.googleapis.com/compute/beta/projects/hc-terraform-testing/global/instanceTemplates/igm-test-bhvjqh6a61
          service_account.# = 1
          service_account.0.email = default
          service_account.0.scopes.# = 3
          service_account.0.scopes.1632638332 = https://www.googleapis.com/auth/devstorage.read_only
          service_account.0.scopes.2428168921 = https://www.googleapis.com/auth/userinfo.email
          service_account.0.scopes.2862113455 = https://www.googleapis.com/auth/compute.readonly
          tags.# = 2
          tags.1996459178 = bar
          tags.2356372769 = foo
          tags_fingerprint = 
        google_compute_target_pool.igm-basic:
          ID = igm-test-u3zeiq2g3x
          provider = provider.google
          backup_pool = 
          description = Resource created for Terraform acceptance testing
          failover_ratio = 0
          health_checks.# = 0
          instances.# = 0
          name = igm-test-u3zeiq2g3x
          project = hc-terraform-testing
          region = us-central1
          self_link = https://www.googleapis.com/compute/v1/projects/hc-terraform-testing/regions/us-central1/targetPools/igm-test-u3zeiq2g3x
          session_affinity = CLIENT_IP_PROTO
FAIL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made version optional and computed, but I cannot make instance_template Computed and Required, shouldn't we leave it Required ? (it is required within the version)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that should be fine then. I'll run the test again and check it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran tests and still hit the perpetual diff issue. Making version Optional & Computed is apparently not enough.

Here is the error:

UPDATE: google_compute_instance_group_manager.igm-basic
  version.#: "" => "<computed>"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to make tests pass by removing "Computed" flag, and now I see only one test failing (the one I wrote), because of the ImportStateVerify flag in my TestStep.

Do I need to add something to make version fields importable ?

--- FAIL: TestAccInstanceGroupManager_versions (137.02s)
        testing.go:513: Step 1 error: ImportStateVerify attributes not equivalent. Difference is shown below. Top is actual, bottom is expected.
                
                (map[string]string) {
                }
                
                
                (map[string]string) (len=9) {
                 (string) (len=9) "version.#": (string) (len=1) "2",
                 (string) (len=27) "version.0.instance_template": (string) (len=122) "https://www.googleapis.com/compute/beta/projects/myproject/global/instanceTemplates/igm-test-q1y1jv
92jt",
                 (string) (len=14) "version.0.name": (string) (len=7) "primary",
                 (string) (len=23) "version.0.target_size.#": (string) (len=1) "0",
                 (string) (len=27) "version.1.instance_template": (string) (len=122) "https://www.googleapis.com/compute/beta/projects/myproject/global/instanceTemplates/igm-test-c4hgbmva8w",
                 (string) (len=14) "version.1.name": (string) (len=6) "canary",
                 (string) (len=23) "version.1.target_size.#": (string) (len=1) "1",
                 (string) (len=29) "version.1.target_size.0.fixed": (string) (len=1) "1",
                 (string) (len=31) "version.1.target_size.0.percent": (string) (len=1) "0"
                }               

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some debugging on my end and was able to get it working- the d.Set call for versions was failing, which is why the import test failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thank you, I should have catch this one, this is all my bad! Sorry for that!

}
```

Beware that exactly one version must not specify a target size (through target_size_fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind moving this note down to the versions section?

return result
}

func flattendFixedOrPercent(fixedOrPercent *computeBeta.FixedOrPercent) map[string]interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/flattend/flattened

target_size_percent = 20
}
```
## Example Usage with multiple Versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I wasn't clear- you're more than welcome to leave this example up at the top- I just wanted the caveat about target sizes moved (which you did)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, should be fixed now!

migibert and others added 4 commits May 30, 2018 12:16
…he IGM, a single patch will by default be considered as an OPPORTUNISTIC update and we need to read the type of update from rolling_update_policy settings
manager := &computeBeta.InstanceGroupManager{
UpdatePolicy: expandUpdatePolicy(d.Get("rolling_update_policy").([]interface{})),
}
op, err := config.clientComputeBeta.InstanceGroupManagers.Patch(project, zone, d.Id(), manager).Do()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not totally sure this will work. For rolling updates, wouldn't we want the UpdatePolicy in the same patch call that we're actually making the update in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works because the first Patch (the one triggered if "version" field changes) updates versions and their parameters (target size and instance template) but does not trigger the update (as it is considered as an opportunistic update). The second Patch (called through performUpdate method) will trigger a proactive update.
I agree this is not ideal because there is a short window where the first opportunistic update could be triggered before the second proactive update.
Will take a look and see if I can refactor this part!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like TestAccInstanceGroupManager_rollingUpdatePolicy is failing now :(

@migibert migibert force-pushed the support_canary_releases branch from 14672f1 to 1060031 Compare June 4, 2018 12:51
@migibert
Copy link
Contributor Author

migibert commented Jun 4, 2018

Tests should pass now :)

Copy link
Contributor

@danawillow danawillow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for your patience! Looks good!

@danawillow danawillow merged commit d60381a into hashicorp:master Jun 4, 2018
chrisst pushed a commit to chrisst/terraform-provider-google that referenced this pull request Nov 9, 2018
Hi there,

Here is an attempt to implement canary releases ( hashicorp#1252 ). This is the first time I write golang and make a terraform contribution, I opened the PR to obtain feedback and advices so please let me know how I can improve this code!

In addition I used `make fmt` to format the code but left some lines bigger than 80 characters, do I need to split them ?

I tested the feature against a project with the following configuration:
```
resource "google_compute_health_check" "mikael-hackathon-healthcheck" {
  name                = "mikael-hackathon-healthcheck"
  check_interval_sec  = 1
  timeout_sec         = 1
  healthy_threshold   = 2
  unhealthy_threshold = 10

  http_health_check {
    request_path = "/"
    port         = "80"
  }
}

resource "google_compute_instance_template" "mikael-hackaton-template" {
  name_prefix = "mikael-hackaton-"
  description = "This template is used to create app server instances."

  tags = ["loadbalanced", "internal-web", "hackaton"]

  labels = {
    environment = "hackaton"
  }

  instance_description = "Hackaton demo rolling upgrade"
  machine_type         = "n1-standard-1"
  can_ip_forward       = false

  scheduling {
    automatic_restart   = true
    on_host_maintenance = "MIGRATE"
  }

  disk {
    source_image = "debian-cloud/debian-9"
    disk_type    = "pd-standard"
    disk_size_gb = 20
    auto_delete  = true
    boot         = true
  }

  network_interface {
    network       = "default"
    access_config = {}
  }

  service_account {
    email  = "${google_service_account.mikael-hackaton.email}"
    scopes = ["cloud-platform"]
  }

  lifecycle {
    create_before_destroy = true
  }

  metadata_startup_script = "apt-get update && apt-get install -y apache2 && echo I am stable version at $(hostname) > /var/www/html/index.html"
}

resource "google_compute_instance_template" "mikael-hackaton-template-canary" {
  name_prefix = "mikael-hackaton-canary"
  description = "This template is used to create app server instances."

  tags = ["loadbalanced", "internal-web", "hackaton"]

  labels = {
    environment = "hackaton"
  }

  instance_description = "Hackaton demo rolling upgrade"
  machine_type         = "n1-standard-1"
  can_ip_forward       = false

  scheduling {
    automatic_restart   = true
    on_host_maintenance = "MIGRATE"
  }

  disk {
    source_image = "debian-cloud/debian-9"
    disk_type    = "pd-standard"
    disk_size_gb = 20
    auto_delete  = true
    boot         = true
  }

  network_interface {
    network       = "default"
    access_config = {}
  }

  service_account {
    email  = "${google_service_account.mikael-hackaton.email}"
    scopes = ["cloud-platform"]
  }

  lifecycle {
    create_before_destroy = true
  }

  metadata_startup_script = "apt-get update && apt-get install -y apache2 && echo I am a canary at $(hostname) > /var/www/html/index.html"
}

resource "google_compute_target_pool" "mikael-hackaton-target-pool" {
  name = "mikael-hackaton-target-pool"
}

resource "google_compute_instance_group_manager" "mikael-hackaton-manager" {
  name = "mikael-hackaton-manager"
  base_instance_name = "mikael-hackaton"
  #instance_template = "${google_compute_instance_template.mikael-hackaton-template.self_link}"
  update_strategy   = "ROLLING_UPDATE"
  zone              = "${var.zone}"
  target_pools = ["${google_compute_target_pool.mikael-hackaton-target-pool.self_link}"]
  target_size  = 5

  version {
    name = "primary"
    instance_template = "${google_compute_instance_template.mikael-hackaton-template.self_link}"
  }

  version {
    name = "canary"
    instance_template = "${google_compute_instance_template.mikael-hackaton-template-canary.self_link}"
    target_size_fixed = 1
  }

  named_port {
    name = "http"
    port = 80
  }

  auto_healing_policies {
    health_check      = "${google_compute_health_check.mikael-hackathon-healthcheck.self_link}"
    initial_delay_sec = 10
  }

  rolling_update_policy {
    type                    = "PROACTIVE"
    minimal_action          = "REPLACE"
    max_surge_percent       = 100
    max_unavailable_percent = 50
    min_ready_sec           = 5
  }
}
```
@ghost
Copy link

ghost commented Nov 18, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants