Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable local exec command for waiting until cluster is healthy #701

Conversation

sanjeevgiri
Copy link
Contributor

@sanjeevgiri sanjeevgiri commented Jan 20, 2020

PR o'clock

Description

Currently executing this module for windows fails due to local-exec command that waits until the cluster is healthy. The command being used until curl -k -s %CLUSTER_HEALTH_ENDPOINT% >/dev/null; do sleep 4; done only works in *nix systems. We could default to using this command, however, it would be great if non *nix users would have to option to specify custom command that would achieve the same. (Even better would be terraform specific resource that would allow us to wait until a http url becomes available :)).

This PR attempts to add the ability to define os-specific commands to wait for a healthy cluster to be available.

Checklist

#680

@sanjeevgiri sanjeevgiri reopened this Jan 20, 2020
@sanjeevgiri
Copy link
Contributor Author

sanjeevgiri commented Jan 20, 2020

@maganuk @barryib @dpiddockcmp @RothAndrew . Would you guys be kind enough to look at his and shed some insight on how I've approached it? I am open to suggestions and discussions that drive to a meaningful solution. I am fairly new new to terraform, so your reviews would be greatly appreciated.

)

* Configurable local exec command for waiting until cluster is healthy

* readme

* line feeds

* format

* fix readme

* fix readme
@maganuk
Copy link

maganuk commented Jan 20, 2020

Hey, I don't know whats changed in v8 but v7 use to work just fine without waiting for the endpoint to be active. Maybe because I was doing my own Aws auth, and there is an option to skip Aws auth. For such cases, can we not make this check optional? For this, could you add a count variable and have a variable like Wait_For_Endpoint_Ready. Default can be true. Your approach seems fine. There was a similar thing done in one of the previous versions of this module.

@sanjeevgiri
Copy link
Contributor Author

sanjeevgiri commented Jan 20, 2020

Hey, I don't know whats changed in v8 but v7 use to work just fine without waiting for the endpoint to be active. Maybe because I was doing my own Aws auth, and there is an option to skip Aws auth. For such cases, can we not make this check optional?

It looks like aws_auth.tf had local-exec to update kube config map if manage_aws_auth was set. This local-exec has been removed in this revision 9363662. @stijndehaes can elaborate a little on that.

The check for healthy cluster was added by @shaunc as a part of #639. They seem to be related though.

@stijndehaes @shaunc should be able to shed some light on this? Thanks in advance.

* Configurable local exec command for waiting until cluster is healthy

* readme

* line feeds

* format

* fix readme

* fix readme

* change log
* changelog

* changelog
@shaunc
Copy link
Contributor

shaunc commented Jan 20, 2020

We did try terraform specific (like using http resource for health check) -- couldn't get to work. I think the maintainers decided that windows probably had other things broken as well; but certainly PR that allowed customizing command would seem reasonable to me (I am not a maintainer). I started using after the switch from local_exec to kubernetes, so I can't speak to that except that the maintainers do want to shift to terraform-supported if possible. (Just we couldn't figure out how to wait for cluster using only terraform.)

@maganuk
Copy link

maganuk commented Jan 20, 2020 via email

@shaunc
Copy link
Contributor

shaunc commented Jan 20, 2020

You'll have to get a maintainer to chime in. IMO terraform-built resources are generally supposed to be "ready to use" when apply completes, so they can be used as part of larger builds. From that perspective, waiting for the cluster to be up should be the "normal case", irrespective of what it is used for internally. But having some option to change (or possibly skip) the health check doesn't seem unreasonable.

@max-rocket-internet
Copy link
Contributor

max-rocket-internet commented Jan 22, 2020

This approach with the %CLUSTER_HEALTH_ENDPOINT% and so on is quite complicated. Could we just use environment on the local-exec and put the endpoint in there? then maybe something like this:

provisioner "local-exec" {
  command = <<EOT
  ${var.wait_for_cluster_cmd == "" ? "until curl -k -s $${ENDPOINT}/healthz >/dev/null; do sleep 4; done" : var.wait_for_cluster_cmd}
EOT
}

Since we’re only using this for Manage_AWS_Auth, can we make the local_exec conditional based on the value of Manage_AWS_Auth?

That's also a good idea.

EDIT: maybe don't even need the ugly <<EOT 😃

@sanjeevgiri
Copy link
Contributor Author

@max-rocket-internet let me explore your advice. :) Thanks for chiming in.

  • I am not entirely sure if using dynamically setting ENDPOINT as an environment variable is possible. I checked the docs for local-exec, and it seems like it has a environment parameter. Perhaps that can be leveraged to some extent. The catch is that the environment variable needs to be tied to url that is dynamically generated.

  • For conditionally executing the local-exec: (I may be too much of a newbie here :) ), is it possible to use count within the local-exec block or would it need to be externalized into a different resource? I will research myself while I keep you guys confused with my newbie questions heh.

@sanjeevgiri
Copy link
Contributor Author

sanjeevgiri commented Jan 22, 2020

  • For conditionally executing the local-exec: (I may be too much of a newbie here :) ), is it possible to use count within the local-exec block or would it need to be externalized into a different resource? I will research myself while I keep you guys confused with my newbie questions heh.

On further reading, looks like count can be used with local-exec (meta-arguments, I will try that)

cluster.tf Outdated
command = <<EOT
until curl -k -s ${aws_eks_cluster.this[0].endpoint}/healthz >/dev/null; do sleep 4; done
EOT
command = var.manage_aws_auth ? var.wait_for_cluster_cmd : ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-rocket-internet this would optionally perform a no-op if manage_aws_auth is disabled? I may have to use count, but was unsure on wiring dependencies on the kubernetes_config_map resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it but have you tested this in both situations? No errors or issues with having a command of ""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this error if manage_aws_auth = false:

Error: local-exec provisioner command must be a non-empty string

So I think you need to do this:

resource "null_resource" "wait_for_cluster" {
  count      = var.manage_aws_auth ? 1 : 0

  depends_on = [
    aws_eks_cluster.this[0]
  ]

  provisioner "local-exec" {
    command = var.wait_for_cluster_cmd
    environment = {
      ENDPOINT = aws_eks_cluster.this[0].endpoint
    }
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I wanted to in my initial attempt, but I was not sure how this would be included as a dependency in the aws_auth module. I will try this and test without aws_auth. Thanks again.

EOT
command = var.manage_aws_auth ? var.wait_for_cluster_cmd : ""
environment = {
ENDPOINT = aws_eks_cluster.this[0].endpoint
Copy link
Contributor Author

@sanjeevgiri sanjeevgiri Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is close to what you were expressing? Not sure :) but it does seem to work, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is much cleaner IMO 😃

variables.tf Outdated Show resolved Hide resolved
@sanjeevgiri
Copy link
Contributor Author

sanjeevgiri commented Jan 23, 2020

@max-rocket-internet I believe I have addressed all your concerns. I tested the changes without aws managed authentication too. I am not sure what is happening with docs linter?

@sanjeevgiri
Copy link
Contributor Author

Anyone has any requests for me? I believe I have addressed the changes requested so far. Can someone help me with doc linter issue. It seems like its failing for everyone.

@max-rocket-internet
Copy link
Contributor

I tested with basic example with manage_aws_auth set to true and false and works well for me.

Copy link
Contributor

@max-rocket-internet max-rocket-internet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sanjeevgiri, well done!

@max-rocket-internet max-rocket-internet merged commit 905d9f0 into terraform-aws-modules:master Jan 27, 2020
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants