Configurable local exec command for waiting until cluster is healthy #701

sanjeevgiri · 2020-01-20T14:52:29Z

PR o'clock

Description

Currently executing this module for windows fails due to local-exec command that waits until the cluster is healthy. The command being used until curl -k -s %CLUSTER_HEALTH_ENDPOINT% >/dev/null; do sleep 4; done only works in *nix systems. We could default to using this command, however, it would be great if non *nix users would have to option to specify custom command that would achieve the same. (Even better would be terraform specific resource that would allow us to wait until a http url becomes available :)).

This PR attempts to add the ability to define os-specific commands to wait for a healthy cluster to be available.

Checklist

Change added to CHANGELOG.md. All changes must be added and breaking changes and highlighted
CI tests are passing
README.md has been updated after any changes to variables and outputs. See https://github.com/terraform-aws-modules/terraform-aws-eks/#doc-generation

#680

sanjeevgiri · 2020-01-20T16:42:48Z

@maganuk @barryib @dpiddockcmp @RothAndrew . Would you guys be kind enough to look at his and shed some insight on how I've approached it? I am open to suggestions and discussions that drive to a meaningful solution. I am fairly new new to terraform, so your reviews would be greatly appreciated.

) * Configurable local exec command for waiting until cluster is healthy * readme * line feeds * format * fix readme * fix readme

maganuk · 2020-01-20T16:50:13Z

Hey, I don't know whats changed in v8 but v7 use to work just fine without waiting for the endpoint to be active. Maybe because I was doing my own Aws auth, and there is an option to skip Aws auth. For such cases, can we not make this check optional? For this, could you add a count variable and have a variable like Wait_For_Endpoint_Ready. Default can be true. Your approach seems fine. There was a similar thing done in one of the previous versions of this module.

sanjeevgiri · 2020-01-20T17:10:08Z

Hey, I don't know whats changed in v8 but v7 use to work just fine without waiting for the endpoint to be active. Maybe because I was doing my own Aws auth, and there is an option to skip Aws auth. For such cases, can we not make this check optional?

It looks like aws_auth.tf had local-exec to update kube config map if manage_aws_auth was set. This local-exec has been removed in this revision 9363662. @stijndehaes can elaborate a little on that.

The check for healthy cluster was added by @shaunc as a part of #639. They seem to be related though.

@stijndehaes @shaunc should be able to shed some light on this? Thanks in advance.

* Configurable local exec command for waiting until cluster is healthy * readme * line feeds * format * fix readme * fix readme * change log

* changelog * changelog

shaunc · 2020-01-20T19:46:02Z

We did try terraform specific (like using http resource for health check) -- couldn't get to work. I think the maintainers decided that windows probably had other things broken as well; but certainly PR that allowed customizing command would seem reasonable to me (I am not a maintainer). I started using after the switch from local_exec to kubernetes, so I can't speak to that except that the maintainers do want to shift to terraform-supported if possible. (Just we couldn't figure out how to wait for cluster using only terraform.)

maganuk · 2020-01-20T19:56:02Z

Since we’re only using this for Manage_AWS_Auth, can we make the local_exec conditional based on the value of Manage_AWS_Auth?

…

On Tue, 21 Jan 2020 at 01:16, Shaun Cutts ***@***.***> wrote: We did try terraform specific (like using http resource for health check) -- couldn't get to work. I think the maintainers decided that windows probably had other things broken as well; but certainly PR that allowed customizing command would seem reasonable to me (I am not a maintainer). I started using after the switch from local_exec to kubernetes, so I can't speak to that except that the maintainers do want to shift to terraform-supported if possible. (Just we couldn't figure out how to wait for cluster using only terraform.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#701?email_source=notifications&email_token=ADZIOFG7QDYN5TRNQBUOQBDQ6X5PXA5CNFSM4KJFOLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNU2SI#issuecomment-576408905>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADZIOFD4VMBHJAEJEYZVUBDQ6X5PXANCNFSM4KJFOLXA> .

shaunc · 2020-01-20T20:02:11Z

You'll have to get a maintainer to chime in. IMO terraform-built resources are generally supposed to be "ready to use" when apply completes, so they can be used as part of larger builds. From that perspective, waiting for the cluster to be up should be the "normal case", irrespective of what it is used for internally. But having some option to change (or possibly skip) the health check doesn't seem unreasonable.

max-rocket-internet · 2020-01-22T15:07:20Z

This approach with the %CLUSTER_HEALTH_ENDPOINT% and so on is quite complicated. Could we just use environment on the local-exec and put the endpoint in there? then maybe something like this:

provisioner "local-exec" {
  command = <<EOT
  ${var.wait_for_cluster_cmd == "" ? "until curl -k -s $${ENDPOINT}/healthz >/dev/null; do sleep 4; done" : var.wait_for_cluster_cmd}
EOT
}

Since we’re only using this for Manage_AWS_Auth, can we make the local_exec conditional based on the value of Manage_AWS_Auth?

That's also a good idea.

EDIT: maybe don't even need the ugly <<EOT 😃

sanjeevgiri · 2020-01-22T15:59:51Z

@max-rocket-internet let me explore your advice. :) Thanks for chiming in.

I am not entirely sure if using dynamically setting ENDPOINT as an environment variable is possible. I checked the docs for local-exec, and it seems like it has a environment parameter. Perhaps that can be leveraged to some extent. The catch is that the environment variable needs to be tied to url that is dynamically generated.
For conditionally executing the local-exec: (I may be too much of a newbie here :) ), is it possible to use count within the local-exec block or would it need to be externalized into a different resource? I will research myself while I keep you guys confused with my newbie questions heh.

sanjeevgiri · 2020-01-22T16:04:57Z

For conditionally executing the local-exec: (I may be too much of a newbie here :) ), is it possible to use count within the local-exec block or would it need to be externalized into a different resource? I will research myself while I keep you guys confused with my newbie questions heh.

On further reading, looks like count can be used with local-exec (meta-arguments, I will try that)

sanjeevgiri · 2020-01-22T22:16:11Z

cluster.tf

-    command = <<EOT
-    until curl -k -s ${aws_eks_cluster.this[0].endpoint}/healthz >/dev/null; do sleep 4; done
-  EOT
+    command = var.manage_aws_auth ? var.wait_for_cluster_cmd : ""


@max-rocket-internet this would optionally perform a no-op if manage_aws_auth is disabled? I may have to use count, but was unsure on wiring dependencies on the kubernetes_config_map resource.

Like it but have you tested this in both situations? No errors or issues with having a command of ""

I get this error if manage_aws_auth = false:

Error: local-exec provisioner command must be a non-empty string

So I think you need to do this:

resource "null_resource" "wait_for_cluster" { count = var.manage_aws_auth ? 1 : 0 depends_on = [ aws_eks_cluster.this[0] ] provisioner "local-exec" { command = var.wait_for_cluster_cmd environment = { ENDPOINT = aws_eks_cluster.this[0].endpoint } } }

This is what I wanted to in my initial attempt, but I was not sure how this would be included as a dependency in the aws_auth module. I will try this and test without aws_auth. Thanks again.

sanjeevgiri · 2020-01-22T22:16:46Z

cluster.tf

-  EOT
+    command = var.manage_aws_auth ? var.wait_for_cluster_cmd : ""
+    environment = {
+      ENDPOINT = aws_eks_cluster.this[0].endpoint


I believe this is close to what you were expressing? Not sure :) but it does seem to work, thanks.

Yes. This is much cleaner IMO 😃

variables.tf

sanjeevgiri · 2020-01-23T22:07:55Z

@max-rocket-internet I believe I have addressed all your concerns. I tested the changes without aws managed authentication too. I am not sure what is happening with docs linter?

sanjeevgiri · 2020-01-27T10:16:51Z

Anyone has any requests for me? I believe I have addressed the changes requested so far. Can someone help me with doc linter issue. It seems like its failing for everyone.

max-rocket-internet · 2020-01-27T15:28:07Z

I tested with basic example with manage_aws_auth set to true and false and works well for me.

max-rocket-internet

Thanks @sanjeevgiri, well done!

github-actions · 2022-11-18T02:31:09Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Configurable local exec command for waiting until cluster is healthy

19aff79

sanjeevgiri closed this Jan 20, 2020

sanjeevgiri added 2 commits January 20, 2020 10:04

readme

63f5188

line feeds

9712229

sanjeevgiri reopened this Jan 20, 2020

sanjeevgiri added 3 commits January 20, 2020 10:30

format

6d99112

fix readme

d65f318

fix readme

c9e4959

sanjeevgiri requested review from max-rocket-internet, barryib and brandonjbjelland January 20, 2020 16:44

Configurable local exec command for waiting until cluster is healthy (#1

ed981a6

) * Configurable local exec command for waiting until cluster is healthy * readme * line feeds * format * fix readme * fix readme

sanjeevgiri added 4 commits January 20, 2020 14:27

change log

4bf26e1

Configurable local exec wait 4 cluster op (#2)

bc04038

* Configurable local exec command for waiting until cluster is healthy * readme * line feeds * format * fix readme * fix readme * change log

changelog (#3)

4edaf2e

Changelog (#4)

32ed440

* changelog * changelog

simplify wait_for_cluster command

2be25f3

sanjeevgiri commented Jan 22, 2020

View reviewed changes

readme

963e3c2

max-rocket-internet reviewed Jan 23, 2020

View reviewed changes

variables.tf Outdated Show resolved Hide resolved

no op for manage auth false

810466a

formatting

1ee83f2

sanjeevgiri added 4 commits January 23, 2020 17:15

docs? not sure

4fa9aff

linter

878526f

specify dependency to wait for cluster more accurately

523a0a9

merge

efe9174

max-rocket-internet approved these changes Jan 27, 2020

View reviewed changes

max-rocket-internet merged commit 905d9f0 into terraform-aws-modules:master Jan 27, 2020

avoidik mentioned this pull request Mar 14, 2020

feat: Add interpreter option to wait_for_cluster_cmd #795

Merged

3 tasks

github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable local exec command for waiting until cluster is healthy #701

Configurable local exec command for waiting until cluster is healthy #701

sanjeevgiri commented Jan 20, 2020 •

edited

Loading

sanjeevgiri commented Jan 20, 2020 •

edited

Loading

maganuk commented Jan 20, 2020 •

edited

Loading

sanjeevgiri commented Jan 20, 2020 •

edited

Loading

shaunc commented Jan 20, 2020

maganuk commented Jan 20, 2020 via email

shaunc commented Jan 20, 2020

max-rocket-internet commented Jan 22, 2020 •

edited

Loading

sanjeevgiri commented Jan 22, 2020

sanjeevgiri commented Jan 22, 2020 •

edited

Loading

sanjeevgiri Jan 22, 2020

max-rocket-internet Jan 23, 2020

max-rocket-internet Jan 23, 2020

sanjeevgiri Jan 23, 2020

sanjeevgiri Jan 22, 2020 •

edited

Loading

max-rocket-internet Jan 23, 2020

sanjeevgiri commented Jan 23, 2020 •

edited

Loading

sanjeevgiri commented Jan 27, 2020

max-rocket-internet commented Jan 27, 2020

max-rocket-internet left a comment

github-actions bot commented Nov 18, 2022

Configurable local exec command for waiting until cluster is healthy #701

Configurable local exec command for waiting until cluster is healthy #701

Conversation

sanjeevgiri commented Jan 20, 2020 • edited Loading

PR o'clock

Description

Checklist

sanjeevgiri commented Jan 20, 2020 • edited Loading

maganuk commented Jan 20, 2020 • edited Loading

sanjeevgiri commented Jan 20, 2020 • edited Loading

shaunc commented Jan 20, 2020

maganuk commented Jan 20, 2020 via email

shaunc commented Jan 20, 2020

max-rocket-internet commented Jan 22, 2020 • edited Loading

sanjeevgiri commented Jan 22, 2020

sanjeevgiri commented Jan 22, 2020 • edited Loading

sanjeevgiri Jan 22, 2020

Choose a reason for hiding this comment

max-rocket-internet Jan 23, 2020

Choose a reason for hiding this comment

max-rocket-internet Jan 23, 2020

Choose a reason for hiding this comment

sanjeevgiri Jan 23, 2020

Choose a reason for hiding this comment

sanjeevgiri Jan 22, 2020 • edited Loading

Choose a reason for hiding this comment

max-rocket-internet Jan 23, 2020

Choose a reason for hiding this comment

sanjeevgiri commented Jan 23, 2020 • edited Loading

sanjeevgiri commented Jan 27, 2020

max-rocket-internet commented Jan 27, 2020

max-rocket-internet left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 18, 2022

sanjeevgiri commented Jan 20, 2020 •

edited

Loading

sanjeevgiri commented Jan 20, 2020 •

edited

Loading

maganuk commented Jan 20, 2020 •

edited

Loading

sanjeevgiri commented Jan 20, 2020 •

edited

Loading

max-rocket-internet commented Jan 22, 2020 •

edited

Loading

sanjeevgiri commented Jan 22, 2020 •

edited

Loading

sanjeevgiri Jan 22, 2020 •

edited

Loading

sanjeevgiri commented Jan 23, 2020 •

edited

Loading