Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depends_on should should defer interpolation #22036

Closed
mutt13y opened this issue Jul 11, 2019 · 14 comments
Closed

depends_on should should defer interpolation #22036

mutt13y opened this issue Jul 11, 2019 · 14 comments

Comments

@mutt13y
Copy link

mutt13y commented Jul 11, 2019

Current Terraform Version

Terraform v0.12.3

Use-cases

interpolation functions should not run until the depended_on resource has completed.
Because that resource may change the environment.

Attempted Solutions


resource "null_resource" "lambda" {
  provisioner "local-exec" {
    command = "cd lambda; zip -u git_hook.zip git_hook.py"
  }
}

resource "aws_lambda_function" "git_hook" {
  filename      = "lambda/git_hook.zip"
  function_name = "git_hook_sqs"
  role          = aws_iam_role.iam_for_lambda.arn
  handler       = "lambda_handler"

  source_code_hash = filebase64sha256("lambda/git_hook.zip")

  runtime = "python3.7"

  environment {
    variables = {
      foo = "bar"
    }
  }

  depends_on = [null_resource.lambda]
}

This immediately fails with

Call to function "filebase64sha256" failed: no file exists at lambda/git_hook.zip

Proposal

depends_on should be transitive

current plan


Terraform will perform the following actions:                                           
                                                                                        
  # aws_lambda_function.git_hook will be created                                        
  + resource "aws_lambda_function" "git_hook" {                                         
      + arn                            = (known after apply)                            
      + filename                       = "lambda/git_hook.zip"                          
      + function_name                  = "git_hook_sqs"                                 
      + handler                        = "lambda_handler"                               
      + id                             = (known after apply)                            
      + invoke_arn                     = (known after apply)                            
      + last_modified                  = (known after apply)                            
      + memory_size                    = 128                                            
      + publish                        = false                                          
      + qualified_arn                  = (known after apply)                            
      + reserved_concurrent_executions = -1                                             
      + role                           = "arn:aws:iam::121613305665:role/iam_for_lambda"
      + runtime                        = "python3.7"                                    
      + source_code_hash               = "7QPosc5Dyd/EDhlFOdc25BY0ToF++NO78ARgkQnsK4s=" 
      + source_code_size               = (known after apply)                            
      + timeout                        = 3                                              
      + version                        = (known after apply)                            
                                                                                        
      + environment {                                                                   
          + variables = {                                                               
              + "foo" = "bar"                                                           
            }                                                                           
        }                                                                               
                                                                                        
      + tracing_config {                                                                
          + mode = (known after apply)                                                  
        }                                                                               
    }                                                                                   
                                                                                        
Plan: 1 to add, 0 to change, 0 to destroy.                                              

should be

Terraform will perform the following actions:                                           
                                                                                        
  # aws_lambda_function.git_hook will be created                                        
  + resource "aws_lambda_function" "git_hook" {                                         
      + arn                            = (known after apply)                            
      + filename                       = "lambda/git_hook.zip"                          
      + function_name                  = "git_hook_sqs"                                 
      + handler                        = "lambda_handler"                               
      + id                             = (known after apply)                            
      + invoke_arn                     = (known after apply)                            
      + last_modified                  = (known after apply)                            
      + memory_size                    = 128                                            
      + publish                        = false                                          
      + qualified_arn                  = (known after apply)                            
      + reserved_concurrent_executions = -1                                             
      + role                           = "arn:aws:iam::121613305665:role/iam_for_lambda"
      + runtime                        = "python3.7"                                    
      + source_code_hash               = (known after apply)  
      + source_code_size               = (known after apply)                            
      + timeout                        = 3                                              
      + version                        = (known after apply)                            
                                                                                        
      + environment {                                                                   
          + variables = {                                                               
              + "foo" = "bar"                                                           
            }                                                                           
        }                                                                               
                                                                                        
      + tracing_config {                                                                
          + mode = (known after apply)                                                  
        }                                                                               
    }                                                                                   
                                                                                        
Plan: 1 to add, 0 to change, 0 to destroy.                                              

References

@apparentlymart
Copy link
Contributor

Hi @mutt13y,

The various functions with file in their names that read files from disk are intended for use with files that are delivered as part of the configuration, such as being checked in to version control alongside the .tf files that reference them. They are not for files that are generated during the terraform apply step.

We generally recommend against using Terraform to generate temporary artifacts locally, since that isn't really what it is for. We offer the facilities to do so because we're pragmatic and want to enable users to do some things that are slightly outside of Terraform's scope when needed, but the experience when doing so won't necessarily be smooth.

If generating the zip file as part of the terraform apply is important to your use case (rather than generating the artifact as a separate build step prior to running Terraform, which we'd recommend for most cases), I'd suggest also generating a hash of the file at the same time using the same mechanism (shell commands), rather than trying to mix work done by an external program with work done by Terraform.

@mutt13y
Copy link
Author

mutt13y commented Jul 12, 2019

Hi @apparentlymart,
I take your point, I am just wondering what the use case is for the local_exec provisioner if it is not executed first (or last).
If there is a command that needs to be executed locally I think mostly you would want to run it before the apply or after it.
So perhaps an option on local_exec to control when it runs could be an option ?

Stuart

@apparentlymart
Copy link
Contributor

Provisioners in general are a sort of "last resort" feature for doing small fixups after an object is created that don't otherwise fit into Terraform's declarative model. For example, in some environments it's impractical to customize machine images so that compute instances can immediately start their work on boot, and so provisioners can fill that gap by allowing last-moment initialization to happen on the remote host. As an example for local-exec in particular, it is sometimes used to run the official CLI tool of whatever remote system they are working with in order to run some non-declarative side-effects that are needed to get some object up and running fully.

Where possible though, Terraform prefers to think of infrastructure objects as a sort of "appliance" that just starts immediately doing its job as soon as it's created. For managed services that sort of behavior tends to come for free. For services you deploy yourself into a generic virtual machine that will generally require a custom machine image and a feature like EC2's user_data to pass in custom settings to that image.

They can also be used for things that I might claim Terraform shouldn't be used for, such as generating artifacts for deployment, because that's just the nature of general features like that. The Terraform team is generally pragmatic about folks using these features to get things done even if it wasn't something the feature was intended for, but that doesn't mean that these unintended uses will come without friction.

Another feature in Terraform that exists to be pragmatic about this sort of unintended use case is the local_file data source, which offers a way to read a file from disk while obeying the usual lifecycle rules for data resources. Since data resources can participate in the dependency graph, that can be used for certain dynamic file creation use-cases. However, it doesn't currently have a mechanism for reading a hash of a file rather than reading the file itself, so in order to work for your use-case here it would need some new features.

I think it would still be better to separate artifact creation from provisioning though, because that has other benefits: you can build and test those artifacts using a CI system as is normally done for code to be deployed, and you can keep a historical tail of older artifacts to roll back to in the event of a problem, etc. There's a more specific suggestion for one way to set this up in the guide Serverless Applications with AWS Lambda and API Gateway. Even if your use-case doesn't include an API portion, the part about deploying the lambda function could still be relevant/useful.

@MikeBlomm
Copy link

MikeBlomm commented Sep 3, 2019

I would also like to see this kind of behaviour added. We currently use the null_resource and local-exec provisioner to copy some files from a GCS bucket to the local machine. The content of these files which are being copied are used in a later step to create some kubernetes secrets.

Although we specify that the kubernetes secret resource is dependent on the local-exec command, it doesn't wait for the local-exec to finish. This results in an error that the file to create the kubernetes secret resource does not exist.

We don't necessarily create artifacts during the Terraform run, but we are very reliant on certain remote files which need to be pulled in during the run.

@mutt13y
Copy link
Author

mutt13y commented Sep 3, 2019

I ended up writing a Makefile, you could use concourse as a better alternative.
I think that if you need something done before or after the apply it is reasonable to use some other tooling.
Would it make more sense to use vault for your secrets ? I am sure the vault provider will have proper dependancies.

I do end up wondering what the actual use case for the local-exec is if we cant control when it runs.

@MikeBlomm
Copy link

MikeBlomm commented Sep 3, 2019

Well as I read the above comment it is only intended for small fixes, not for actual resource creation.

We looked at using Vault, but it currently is overkill to set up and maintain a whole client/server application just for our secrets.

@subos2008
Copy link

subos2008 commented Sep 20, 2019

There's going to be a lot of people wanting this use case when working with lambda, exactly as the OP is doing.

The original request:

interpolation functions should not run until the depended_on resource has completed

seems pretty fair and straightforward to me.

ED: Although, I notice source_code_hash seems to be optional for aws_lambda_function. The documentation says it's:

(Optional) Used to trigger updates.

I have no idea what "Used to trigger updates" is supposed to mean but my terraform does apply without it.

ED: also for the OP's use case https://www.terraform.io/docs/providers/archive/d/archive_file.html would be better

@perry-mitchell
Copy link

perry-mitchell commented Apr 9, 2020

I personally think that the expectation that developers should run additional build steps, in an infrastructure repository, besides terraform apply to be a bit ugly, perhaps unpolished. I'd prefer to keep terraform as the only application that performs tasks in the repo before pushing changes to infrastructure.

The syntax of depends_on, and the examples showing how its used, definitely leads one to believe it could be used for exactly this - what the OP (and myself) is after. I feel like this is an ugly gotcha in terraform that I'd have to explain away to colleages I'm trying to sell it to.

@ebalakumar
Copy link

I managed to make this work with an empty zip file.

@eskp
Copy link

eskp commented Jun 12, 2020

https://www.terraform.io/docs/providers/archive/d/archive_file.html#output_base64sha256

@zachwhaley
Copy link
Contributor

I'd like to add an example use case to this old issue for posterity sake.

We have a reusable module for deploying Lambda functions that can take either a list of source files, a directory of source files, or an S3 bucket and key to download the prepackaged source files. The module uses depends_on and data.local_file to provide the lambda resource with any of the options above for source files.

It works, but the source_code_hash variable is the trickiest part due to the issue at hand.

Snippet:

locals {
  package = "${path.module}/package.zip"
}

data "archive_file" "dir" {
  count = var.source_dir != null ? 1 : 0

  type        = "zip"
  source_dir  = var.source_dir
  output_path = local.package
}

data "archive_file" "files" {
  count = length(var.source_files) > 0 ? 1 : 0

  type        = "zip"
  output_path = local.package

  dynamic "source" {
    for_each = toset(var.source_files)

    content {
      content  = file(source.value)
      filename = basename(source.value)
    }
  }
}

data "external" "download_package" {
  count = var.download_package ? 1 : 0

  program = [
    "aws", "s3api", "get-object", "--bucket=${var.package_s3_bucket}", "--key=${var.package_s3_key}", local.package,
    # The query is used to provide external with a string only JSON output which it requires to work.
    "--query={LastModified:LastModified}", "--output=json",
  ]
}

data "local_file" "package" {
  depends_on = [
    data.archive_file.dir,
    data.archive_file.files,
    data.external.download_package,
  ]

  filename = var.local_package != null ? var.local_package : local.package
}

resource "aws_lambda_function" "this" {
  depends_on = [
    data.local_file.package,
  ]

  filename         = data.local_file.package.filename
  source_code_hash = var.ignore_source_code_changes ? null : filebase64sha256(data.local_file.package.filename)
  ...
}

The above works IF the package zip has already been generated and needs no update, but does not work otherwise.

The workaround below is sort of working, but causes the lambda to redeploy with every plan.

resource "aws_lambda_function" "this" {
  depends_on = [
    data.local_file.package,
  ]

  filename         = data.local_file.package.filename
  source_code_hash = var.ignore_source_code_changes ? null : base64sha256(data.local_file.package.content_base64)
  ...
}

@apparentlymart
Copy link
Contributor

apparentlymart commented Aug 31, 2022

Hi @zachwhaley! Thanks for sharing that use-case and partial workaround.

I think the reason your workaround causes the configuration to be non-converging is because base64sha256 means "calculate a SHA256 hash of this string and then base64 encode it", rather than "decode this base64-encoded string and then generate a SHA256 hash of the result".

The result is therefore syntactically valid (it's a base64-encoded SHA256 hash) but it's not semantically valid: it's a hash of the wrong source content, so it can never match.

Terraform doesn't currently have a function for generating a SHA256 hash of some binary data given as a base64 string, so I don't have a suggestion about how to adapt what you described here to make it work, but just wanted to note that the workaround here is failing for a reason other than what this issue is proposing.


The local_file data source is intended as a way to make Terraform model reading a file as a side-effect that respects dependencies rather than pretending that it's a pure function (as all of the file-reading functions do, as a compromise for convenience in the common case of reading static files included in the configuration).

Your comment here makes me realize that the local_file data source hasn't kept up with all of the later additions of file-reading functions in Terraform, and that one way to make this work would be to restore feature parity between the hashicorp/local provider and the built-in file-reading functions, so that you can get the needed hash directly from the provider rather than relying on Terraform's built-in functions to calculate it after reading the file content into memory as a string.

The hashicorp/local provider has its own repository and so it's not my place to dictate here what features it ought to have, but I'll start a conversation with the team which maintains that provider to see what they think about this observation and whether it would be reasonable to expand the provider's scope to match all of the direct-file-reading functions we have in today's Terraform language.

@apparentlymart
Copy link
Contributor

Returning to this much later:

I notice that my hashicorp/local proposal was partially accepted upstream and so the provider's local_file data source now exports a similar set of checksums as are supported by Terraform's builtin file checksum functions.

Resources (which includes "data resources", declared by data blocks) are how we model side-effects with dependencies in the Terraform language, and so using this data source is the recommended solution for situations where the file you want to read is being created by some other resource in your configuration, so that you can use the usual techniques to describe the dependencies.

Functions like file are technically "naughty" in that they have a side-effect despite Terraform's assumption that functions don't have side-effects. This naughtiness is a pragmatic concession to make it easier to deal with the much more common case of files being distributed as part of a module's source code, but it does mean that these functions are not appropriate for situations where the file is not already present on disk before Terraform begins its work.

Skipping expression evaluation until all dependencies have been resolved would amount to planning and applying only one resource at a time, because we cannot plan changes for a resource without evaluating its configuration. Therefore I don't think the direct request of this issue is viable, but the following adaptation of the original example should get the desired result using the hashicorp/local provider's features:

resource "null_resource" "lambda" {
  provisioner "local-exec" {
    command = "cd lambda; zip -u git_hook.zip git_hook.py"
  }
}

data "local_file" "lambda" {
  filename = "lambda/git_hook.zip"

  depends_on = [null_resource.lambda]
}

resource "aws_lambda_function" "git_hook" {
  filename      = "lambda/git_hook.zip"
  function_name = "git_hook_sqs"
  role          = aws_iam_role.iam_for_lambda.arn
  handler       = "lambda_handler"

  source_code_hash = data.local_file.lambda.content_base64sha256

  runtime = "python3.7"

  environment {
    variables = {
      foo = "bar"
    }
  }
}

The direct dependency between data.local_file.lambda and null_resource.lambda means that if null_resource.lambda has any changes pending (e.g. if it's planned for creation) then Terraform will wait until the apply step to read from data.local_file.lambda, to allow the provisioner to execute first.

Those who are interested in creating .zip files for AWS Lambda might also investigate using the hashicorp/archive provider's archive_file data source. This one is also a little "naughty" in that it's presented as a data source but yet it writes a file to disk, but that does then mean that the archive can potentially be created during the plan phase rather than only during the apply phase. That data source also exposes a base64sha256 checksum of the resulting file, avoiding the need for a separate local_file data resource to obtain that checksum.

Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants