Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacks #3313

Open
2 tasks
yhakbar opened this issue Aug 1, 2024 · 39 comments
Open
2 tasks

Stacks #3313

yhakbar opened this issue Aug 1, 2024 · 39 comments
Labels
accepted Accepted RFC rfc Request For Comments

Comments

@yhakbar
Copy link
Collaborator

yhakbar commented Aug 1, 2024

Summary

To reduce code repetition and make it easier to manage Terragrunt codebases, this proposal introduces a layer of abstraction above terragrunt.hcl files called Stacks. Stacks are defined using files named terragrunt.stack.hcl.

Users will interact with Stacks using commands prefixed with terragrunt stack, which will allow them to create, manage, and destroy Stacks.

Motivation

Many users using Terragrunt experience repetition with terragrunt.hcl files in their repositories.

One reason for this might be that, while Terragrunt configurations provide an abstraction for DRY (Don't Repeat Yourself) OpenTofu/Terraform modules, the ability to abstract the Terragrunt configuration itself is somewhat limited.

Users typically use a collection of terragrunt.hcl files, each of which are relevant to managing an OpenTofu/Terraform module for a single state file. Repeatedly provisioning the same module across multiple environments, or multiple times within the same environment currently necessitates replication of the same terragrunt.hcl file for each instantiation of that module.

Users have experienced complications with synchronizing updates across multiple terragrunt.hcl files, and have expressed a desire for a more streamlined way to synchronize updates across multiple terragrunt.hcl files.

In addition, Terragrunt code re-use has been largely limited to Terragrunt configurations found on local filesystems. Expanding tooling so that Terragrunt configurations can be shared across repositories would be beneficial, both for the scalability of Terragrunt codebases, and to expand the ways in which Gruntwork customers can leverage configurations maintained by Gruntwork.

Proposal

Introduce a new terragrunt.stack.hcl configuration file that can be used by Terragrunt to manage a Stack.

terragrunt.stack.hcl

The terragrunt.stack.hcl file will have configurations that entirely focus on generating a stack of terragrunt.hcl files. These terragrunt.hcl files will use the same syntax as current Terragrunt configurations, and use existing tooling to integrate into the stack.

An example terragrunt.stack.hcl file might look like this:

locals {
    version = "v0.0.1"
    environment = "dev"
}
 
unit "service" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/service?ref=${local.version}"
    path = "service" # default would be github.com/gruntwork-io/terragrunt-stacks/stacks/mock/service
}
 
unit "db" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/db?ref=${local.version}"
    path = "db" # default would be github.com/gruntwork-io/terragrunt-stacks/stacks/mock/db
}
 
unit "api" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/api?ref=${local.version}"
    path = "api" # default would be github.com/gruntwork-io/terragrunt-stacks/stacks/mock/api
}

In this example, the terragrunt.stack.hcl file defines three Units: service, db, and api. Each Unit is the path to a directory containing a terragrunt.hcl file, using go-getter to load the configurations locally or from a remote source.

Quick Detour on "Units"

The term "Unit" is language that we haven't standardized externally, but is something that we've been using internally at Gruntwork. It's a way to refer to a single instantiation of an OpenTofu/Terraform module, and we believe the best way to do that is with a terragrunt.hcl file. Whenever you see reference to "Unit", you can mentally replace that with a terragrunt.hcl file. It's a unit of infrastructure, with its own state, potentially integrated into a larger system.

We have yet to standardize this term throughout Terragrunt tooling and documentation, but we believe it's a useful concept to introduce in this proposal.

If you have feedback on this terminology, please let share it!

terragrunt.stack.hcl Configuration Continued

Those unit configuration blocks are used to instantiate Terragrunt Units. The two things that are required for a Unit to be instantiated are:

  1. The source attribute (Required): The way in which Terragrunt is going to fetch the relevant directory containing the terragrunt.hcl file.
  2. The path attribute (Optional): The path to the directory where the unit is going to be generated. If not provided, the default path determined by the source will be used. More on this will be discussed later.

The locals block is one that most Terragrunt users are familiar with. It's a way to define reusable variables throughout a Terragrunt stack.

terragrunt stack Commands

In tandem with introducing a new configuration file, Terragrunt will also have a new set of commands that will allow users to interact with Stacks. These commands will be prefixed with terragrunt stack.

  • terragrunt stack generate: This command will generate the stack of Units, using the configurations in the terragrunt.stack.hcl file.

    What this will do is create a .terragrunt-stack directory next to the terragrunt.stack.hcl file, and populate it with content from the Units defined in the terragrunt.stack.hcl file.

    The paths to the units in the .terragrunt-stack directory will be determined by the path attribute in the unit configuration blocks. If the path attribute is not provided, a default path will be determined based on parsing the source attribute.

    e.g.

    # terragrunt.stack.hcl
    unit "service" {
        source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/service"
        path   = "service"
    }
    
    unit "db" {
        source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/db"
        path   = "db"
    }

    Would generate the file structure:

      .terragrunt-stack
      ├── service
      │   └── terragrunt.hcl
      ├── db
          └── terragrunt.hcl
    
  • terragrunt stack run *: Similar to the run-all command, the stack run command allows users to run commands across all Units recursively discovered in a directory with the run command. A significant difference to the run-all command is that the stack run command will run those commands within the context of the .terragrunt-stack directory. The suffix * is a wildcard that is forwarded to the underlying wrapped binary that Terragrunt is orchestrating (OpenTofu/Terraform), just like it does with run-all.

    e.g.

    terragrunt stack run plan

    Would run terraform plan in each of the Units in the .terragrunt-stack directory. If the .terragrunt-stack directory does not exist, the stack run command will generate it first.

    To ensure that users have full control over this process, the stack run command will have a --terragrunt-generate-stack=false flag that will prevent the .terragrunt-stack directory from being generated. The verbosity of this flag is not ideal, but it is in-line with the verbosity of other Terragrunt flags. This is something to revisit in the future.

  • terragrunt stack output: In order to be able to interact with the Units within a stack outside of it, the stack output command will be introduced. This command will take the outputs of the Units in the Stack and stitch them together into a single output. This will allow users to interact with the stack as a single unit, rather than having to interact with each Unit individually.

    e.g.

    $ terragrunt stack output
    service.output1 = "output1"
    service.output2 = "output2"
    db.output1 = "output1"
    db.output2 = "output2"

    This will allow users to access the outputs of the Units in the Stack, without having to navigate to each Unit individually.

How terragrunt.hcl Files Are Impacted

One of the main goals of this proposal is to make it so that users can take the exact same terragrunt.hcl files they are using today, and use them as part of a Stack. To that end, users should not expect any special syntax used in terragrunt.hcl files used in a Stack.

Units are already frequently written with relative paths for their dependency blocks to reference each other.

e.g.

dependency "db" {
  config_path = "../db"
}

A unit with that dependency block would expect to find a folder named db sibling to it in the directory structure. Stacks take advantage of that, allowing them to be generated dynamically using the path attribute in the unit configuration blocks, and the relative paths in the dependency blocks will work within the context of the .terragrunt-stack directory.

In addition, users frequently use the path from a terragrunt.hcl file at the root of the repository or the .git directory to determine where state files are stored for individual units.

In the context of a Stack, the path simply includes the .terragrunt-stack directory with no changes to how the path is currently calculated.

e.g.

Given the following file structure:

/path/to/dir/service/terragrunt.hcl
/path/to/dir/db/terragrunt.hcl

Replacing the contents of dir with:

/path/to/dir/terragrunt.stack.hcl

Will result in the following file structure once a terragrunt stack command is run:

/path/to/dir/terragrunt.stack.hcl
/path/to/dir/.terragrunt-stack/service/terragrunt.hcl
/path/to/dir/.terragrunt-stack/db/terragrunt.hcl

The service and db units will be generated in the .terragrunt-stack directory, and the dependency block in the service unit will be able to reference the db unit using the same relative path ../db.

The implication to existing terragrunt.hcl files is that they cannot necessarily be easily refactored into Stacks with the initial release due to the need to move state, but it should be trivial to generate new instances of the same Units in a new Stack.

In the future, additional tooling can be explored to help users migrate to Stacks from existing Terragrunt configurations.

How Stacks Use Shared Configurations

A common pattern seen with modern Terragrunt configurations is that they frequently rely on shared configurations via the include configuration block. Users may be familiar with canonical _envcommon directories, that are designed for this in the Gruntwork library. This can be a useful pattern, and one that doesn't have to be abandoned when adopting Stacks.

One benefit of this design, however, is that all Units have a natural alternate location to store shared configurations that they rely on: the terragrunt.stack.hcl file. Units can leverage existing functions like read_terragrunt_config to read configurations from the terragrunt.stack.hcl file.

e.g.

locals {
  stack_config = read_terragrunt_config(find_in_parent_folders("terragrunt.stack.hcl"))

  environment = local.stack_config.locals.environment
}

There may be benefits to introducing new functionality that makes it easier to share configurations across Units in a Stack in the future, but the initial release will not need include anything besides what Terragrunt can do today.

Nesting Stacks

To mitigate the risk of Stacks becoming too large, or repeated, Stacks are designed to be nestable.

e.g.

locals {
    version = "v0.0.1"
    environment = "dev"
}
 
stack "services" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/services?ref=${local.version}"
    path = "services"
}
 
unit "db" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/db?ref=${local.version}"
    path = "db"
}

In this example, the services stack will be generated at .terragrunt-stack/services, and the db unit will be generated at .terragrunt-stack/db. Once the services stack is generated, Terragrunt will recursively generate a stack using the contents of the .terragrunt-stack/services/terragrunt.stack.hcl file until it fully generates the stack.

Any terragrunt stack run * commands will run on the top-level stack, picking up all the nested stacks as part of the process.

Technical Details

To support the introduction of Stacks, the following have to be achieved:

  • A new configuration file will be accepted by Terragrunt: terragrunt.stack.hcl

    The terragrunt.stack.hcl will follow the spec outlined in this proposal. It will support locals and unit blocks.

  • A new command will be introduced: terragrunt stack

    The terragrunt stack command will follow the spec outlined in this proposal. It will support the following subcommands:

    • terragrunt stack generate
    • terragrunt stack output
    • terragrunt stack run *

Considerations:

  • Users will have to .gitignore a new directory: .terragrunt-stack (though they could technically take a vendored approach and commit it).
  • Terragrunt will have a new instance where it will use go-getter to fetch Units for a Stack.
  • Users can write terragrunt.hcl configurations that are invalid in potentially non-obvious ways (they may use paths in their terragrunt.stack.hcl file that don't align with the config_path value in dependency blocks of terragrunt.hcl files).
  • Nested Stacks can result in significantly complicated dependency graphs. It may be hard to reason about a Stack with a large number of nested children.

Press Release

Introducing Terragrunt Stacks!

Stacks are a way to drastically reduce the repetition in Terragrunt codebases by leveraging a new configuration file: terragrunt.stack.hcl.

With the introduction of Stacks, users can now consolidate large numbers of terragrunt.hcl files into a single terragrunt.stack.hcl file.

Stacks are a powerful new feature, and are the largest change to how users write Terragrunt configurations to date.

To get started, try out the new terragrunt stack command, which allows you to create, manage, and destroy Stacks:

mkdir my-stack
cd my-stack
cat > terragrunt.stack.hcl <<EOF
locals {
    version = "v0.0.1"
    environment = "dev"
}

unit "service" {
    # Source is an intentionally broken URL for the press release.
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/service?ref=${local.version}"
    path   = "service" # default would be gruntwork-io/terragrunt-stacks/stacks/mock/service
}

unit "db" {
    # Source is an intentionally broken URL for the press release.
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/db?ref=${local.version}"
    path   = "db" # default would be gruntwork-io/terragrunt-stacks/stacks/mock/db
}

unit "api" {
    # Source is an intentionally broken URL for the press release.
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/api?ref=${local.version}"
    path   = "api" # default would be gruntwork-io/terragrunt-stacks/stacks/mock/api
}

labels = {
	environment = local.environment
}
EOF

terragrunt stack run plan
terragrunt stack run apply

Drawbacks

Potentially Too Much Abstraction

The largest potential drawback to introducing Stacks is that it is yet another layer of abstraction to how users manage their infrastructure.

Terragrunt is already a fairly complex tool, and adding Stacks on top of it may make it more difficult for users to understand how their infrastructure is being managed.

The ways in which this design attempts to mitigate this drawback include:

  1. Stacks are Optional: Users can continue to use Terragrunt as they always have, and only introduce Stacks where they need to scale their Terragrunt codebase.

  2. Stacks are Explicit: Stacks are defined in a separate file, and are not a hidden feature in Terragrunt configurations. This makes it clear when a user is working with a Stack, and when they are not.

  3. Stacks are Simple: The design of Stacks is intentionally simple, with only a few added configurations and commands introduced in this initial proposal.

  4. Stacks are Familiar: All of the work Stacks do to interact with infrastructure is mediated by terragrunt.hcl files. Users can run terragrunt stack generate, and see a .terragrunt-stack directory that operates exactly like a current Terragrunt codebase without Stacks.

    This behavior falls in line with how the .terragrunt-cache directory was designed, allowing users to run tofu/terraform commands within the directory to achieve the same end result, dropping down a layer of abstraction.

Performance

Users leveraging remote Units as part of their stacks will deal with the performance penalty of fetching those Units from a remote source before running any infrastructure updates.

It's probably not a huge penalty to deal with, but users can always vendor their .terragrunt-stack directories and remove the performance penalty entirely.

Alternatives

_envcommon

The alternative that most Terragrunt users use today is to leverage a directory of shared configurations located in a directory named something like _envcommon.

This directory usually contains a collection of files that use Terragrunt HCL configurations. These files are then included in multiple other terragrunt.hcl files via include configuration blocks using the path attribute.

This approach is effective at reducing repetition in Terragrunt codebases, and has some advantages over the proposed solution:

  1. The number of committed terragrunt.hcl files directly relates to the number of Units in the codebase. This can make it easier to initiate individual state updates, as there is always a single terragrunt.hcl file that can be run.
  2. Units can be very easily edited within a directory of Units directly.

However, this approach also has some drawbacks:

  1. Synchronizing updates across many terragrunt.hcl files can be difficult, as there is no built-in way to ensure that all terragrunt.hcl files referencing the same _envcommon file are updated.
  2. The _envcommon directory is not independently versioned, and changes to the _envcommon directory can result in updates with large blast radii.

Larger OpenTofu/Terraform Modules

Another alternative is to put more logic into OpenTofu/Terraform modules themselves, and use a single terragrunt.hcl file to manage the larger module.

This approach is also effective at reducing repetition in Terragrunt codebases, and allows users to put more of the logic for managing infrastructure in .tf files if they would prefer that.

The drawbacks to this approach are largely the reason that using Terragrunt is advantageous:

  1. Managing more infrastructure in a single state file increases the blast radius of a single change.
  2. Functionality like run_cmd, before_hook, after_hook, error_hook can't be used to perform additional logic that is not supported by OpenTofu/Terraform.
  3. Seperation of concerns is more difficult to achieve, as the logic for configuring disparate reusable infrastructure is all in one terragrunt.hcl file.

Migration Strategy

Users that aren't currently using stacks will have to do some work in order to migrate their existing Terragrunt codebases to use Stacks if they want to take advantage of them.

Creating terragrunt.stack.hcl Files

Taking a collection of terragrunt.hcl files, and consolidating them into a single terragrunt.stack.hcl file is the first step in migrating to Stacks

Users will want to consider where they want their terragrunt.hcl files to live (either in the same repo, as part of a monorepo, or in a different, dedicated repository).

Then, they'll want to decide which Units they want to consolidate into a Stack, and write terragrunt.stack.hcl files to reference those Units.

Migrating State

Users will need to consider how they want to migrate their state files to work with Stacks.

For a gradual adoption of Stacks, users should prioritize using Stacks for net new infrastructure, then consider migrating existing infrastructure to Stacks.

Considerations to take into account when migrating state files include, but is not limited to:

  1. The frequency with which the infrastructure is updated: Users may prioritize migrating state for infrastructure that is updated less frequently to avoid accidentally encountering errors during the migration process.
  2. The blast radius of the infrastructure: Users may prioritize migrating state for infrastructure with a smaller blast radius to reduce the cost of accidental errors during the migration process.
  3. The value of migrating the Units to Stacks: Users may prioritize migrating state for Units that are more frequently repeated in the codebase to reduce the amount of code that needs to be managed as a consequence.

⚠️ Before migrating state, some basic precautions are advised. Users should always back up their state files before migrating them, and have a tested disaster recovery plan if accidental updates to infrastructure occur.

To migrate state files, users will want to follow these steps:

# 1. Pull down the state file from the remote state store
cd /path/to/terragrunt/unit
terragrunt state pull > /tmp/tf.tfstate
# 2. Ensure the stack is generated
cd /path/to/terragrunt/stack
terragrunt stack generate
# 3. Push the state to the new location as part of the Stack
cd /path/to/terragrunt/stack/.terragrunt-stack/path/to/unit
terragrunt state push /tmp/tf.tfstate

Unresolved Questions

How does the community feel about introducing Stacks as a feature in Terragrunt?

This will be a significant change to what users see in Terragrunt codebases, and will require that they be comfortable with the new abstraction.

Are there alternate abstractions Gruntwork should prefer to this?

What is the minimum required feature set of Stacks to make them useful?

There is a lot more planned for Stacks than what is presented in this proposal. One goal here is to present the minimum feature set that will make Stacks useful to users, and to receive feedback from the community.

Is there anything missing from this proposal that immediately jumps to mind as a requirement to make Stacks useful?

How does the community feel about the design of Stacks?

Does this seem like a natural abstraction that fits well within the existing Terragrunt ecosystem? Are there any changes that should be made to the design work better?

How does the community feel about the terminology used here?

Do the terms "Stack" and "Unit" make sense in the context of Terragrunt? Are there any other terms that might be more appropriate?

How does this proposal fit into the lifecycle of a Terragrunt Unit?

Careful consideration goes into making sure that Terragrunt has good tooling so that configuration can be introduced into codebases in a sensible and convenient manner, that it is easy to create, update, manage, use, and remove.

Stacks are viewed as a natural extension of this lifecycle, where Units can be refactored into Stacks when they need to be reused repeatedly.

Does this proposal fit well into that lifecycle?

References

Proof of Concept Pull Request

No response

Support Level

  • I have Terragrunt Enterprise Support
  • I am a paying Gruntwork customer

Customer Name

No response

@yhakbar yhakbar added rfc Request For Comments pending-decision Pending decision from maintainers labels Aug 1, 2024
@hugorut
Copy link

hugorut commented Aug 6, 2024

How does the community feel about the terminology used here?

Considering that Terraform is also working on a similar abstraction called "stacks", perhaps another name is better suited as it might get confusing? That being said, you are unlikely to both use Terraform stacks and Terragrunt stacks.

@odgrim
Copy link

odgrim commented Aug 6, 2024

On the one hand... I agree that a name overlapping with an underlying tf feature would be confusing.

On the other... If we deliver first that becomes their problem. They had a blog post last year in November about it and talked about it at hashiconf. If they haven't pushed this in a year it probably lost traction.

@hugorut
Copy link

hugorut commented Aug 7, 2024

@odgrim my understanding is that the feature is in private preview, as mentioned in many of their releases. See https://github.com/hashicorp/terraform/releases/tag/v1.10.0-alpha20240807 as an example. So I still think it's on their roadmap, just seems slow to get to public preview.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Aug 8, 2024

Thanks for the feedback on the RFC, @hugorut!

I've received feedback in multiple forums, so I'll try to give one big consolidated response here.

The Name

It is very sensible to find some confusion around multiple tools leveraging the same name in different contexts. We feel like the name "stack" has salience in the community, and evokes the abstraction we're looking to encapsulate here. We've also been using the term "stack" in this context for a very long time internally. This RFC formalizes it externally, and adds tooling for dynamically generating them.

This is an RFC, however, so if there are other names that folks feel would be more appropriate, make your suggestions heard!

Does Terragrunt Need a Dedicated stack Subcommand?

The proposed implementation here requires that users explicitly utilize the terragrunt stack subcommand when working with Stacks. This is not the only viable way to tackle building out functionality that leverages them!

An alternate approach is to have Terragrunt automatically detect the presence of terragrunt.stack.hcl files, generate the relevant .terragrunt-stack directory, and work on the resulting Stack transparently. This would save users from having to leverage a new subcommand, but it would also introduce more complexity into existing Terragrunt behavior.

How does the community feel about this? Should Terragrunt automatically detect and expand Stacks to their relevant .terragrunt-stack directories, or should users be required to explicitly use the terragrunt stack subcommand?

Some trade-offs to consider:

  • Automatic Detection:

    • Pros:
      • Users don't have to learn a new subcommand.
      • Users don't have to remember to run terragrunt stack run apply instead of terragrunt run-all apply.
    • Cons:
      • More complexity in the behavior of run-all, potentially resulting in performance and usability issues for users that aren't using Stacks.
      • Edge cases have to be handled like deciding whether or not to expand a Stack again if the user has already generated a .terragrunt-stack directory.
  • Explicit Subcommand:

    • Pros:
      • Users can opt-in to using Stacks, rather than having them automatically generate.
      • No change in existing behavior in run-all (at least initially).
    • Cons:
      • Users have to learn and remember to use the stack subcommand when working with Stacks.
      • Existing tooling leverages the run-all subcommand, and this approach has it ignored out of the gate.

Why Aren't There Any Inputs?

This RFC might give some folks the impression that there is no configurability to a Stack. This is not the case! The Stack abstraction is meant to be minimal in design, but using existing Terragrunt tooling can result in flexible implementations.

In the heading How Stacks Use Shared Configurations, you can see how different units within the Stack can reach for shared configuration that's accessible to all units without introducing any new special syntax to Terragrunt configurations for exposing values to units in the Stack.

Not introducing any special tooling for this can make it so that it's easier to reason about how a particular unit is defining its configurations. They will either define them directly in their inputs attribute, or they will explicitly fetch the configuration values from terragrunt.stack.hcl (or any other file), then use them in their configuration.

The convenience of this approach is also that it makes it so that the units generated by the Stack are more portable, and can be used directly without using any stack command or the like.

e.g.

A user could navigate to .terragrunt-stacks/services in the example under the Neesting Stacks heading, then run terragrunt plan, and it would work as expected.

Clarification on the Attributes source and path

There may be some confusion in what exactly is being set in the attributes that are present in the stack and unit configuration blocks.

Put concisely:

  • The source is a go-getter compliant reference to a directory containing a terragrunt.hcl file or a terragrunt.stack.hcl file for the stack and unit configuration blocks, respectively.
  • The path is the relative path within the .terragrunt-stack directory to a new directory that will be created with the contents of the source attribute.

Less concisely:

The source Attribute

The string used for the source attribute is what Terragrunt is going to use to find a directory containing a terragrunt.hcl file or a terragrunt.stack.hcl file.

This string can be a local path:

e.g.

unit {
  source = "/path/on/disk"
}
$ ls /path/on/disk
terragrunt.hcl

Or it can be a remote path:

e.g.

stack {
    source = "github.com/acme/repo"
}
$ git clone https://github.com/acme/repo
$ ls repo
terragrunt.stack.hcl

The remote reference can be versioned, making it so that the Stack can be pinned to a specific version of the configuration:

e.g.

unit {
    source = "github.com/acme/repo?ref=v1.0.0"
}

and a relative path within the repo can be specified:

e.g.

stack {
    source = "github.com/acme/repo//path/to/stack"
}

I've intentionally alternated examples here between unit and stack to demonstrate that the logic for this attribute is almost identical for both. The only difference is that the unit configuration block will expect to find a terragrunt.hcl file, and the stack configuration block will expect to find a terragrunt.stack.hcl file. There's also the added complexity that the stack configuration block will result in recursively expanding Stacks.

Contents of terragrunt.hcl and terragrunt.stack.hcl Files

The files that are discovered in this manner are normal terragrunt.hcl files and terragrunt.stack.hcl files. They don't have any special syntax or anything because they are going to be re-used in the context of the Stack.

Users currently using terragrunt.hcl files would be able to expect that they can (almost always) take the exact same terragrunt.hcl file they are currently using, then re-use it as part of a Stack. Once they create a terragrunt.stack.hcl file that does this, they would expect to be able to do the same thing with a different terragrunt.stack.hcl file referencing the one they just created.

The path Attribute

The point of the terragrunt.stack.hcl file is to generate a stack of files that are going to be used by Terragrunt to provision infrastructure with segmented state. The way that Terragrunt users do this, typically, is to leverage the filesystem path to a terragrunt.hcl file to determine where the state is stored remotely in a backend like S3.

For that reason, it's really useful to be able to decide where in the filesystem Stacks are going to place terragrunt.hcl files. This is what the path attribute is for.

By default, Terragrunt will use the safest possible default path in order to avoid collisions with other configuration blocks. Looking at the example under the heading terragrunt.stack.hcl, you can see the following:

locals {
    version = "v0.0.1"
    environment = "dev"
}

unit "service" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/service?ref=${local.version}"
    path   = "service" # default would be github.com/gruntwork-io/terragrunt-stacks/stacks/mock/service
}

Running terragrunt stack generate will result in a directory being created in the .terragrunt-stack directory named service, with a terragrunt.hcl file that is a copy of the terragrunt.hcl file found in the github.com/gruntwork-io/terragrunt-stacks repository with tag v0.0.1 at the path stacks/mock/service.

Most Terragrunt users set their backend configurations to something like this:

remote_state {
  backend = "s3"
  config = {
    bucket = local.state_bucket
    key    = "${path_relative_to_include()}/tf.tfstate"
  }
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
}

As a consequence of using configuration like this, when running terragrunt plan or terragrunt apply in the .terragrunt-stack/service directory, the path to the state file in S3 will look something like this:

s3://${local.state_bucket}/.../.terragrunt-stack/service/tf.tfstate

It would be perfectly valid for most users to leave off the path attribute in this circumstance and use the default, but the resulting path may be slightly more verbose than they like:

s3://${local.state_bucket}/.../.terragrunt-stack/github.com/gruntwork-io/terragrunt-stacks/stacks/mock/service/tf.tfstate

Users that want to instantiate the same Unit multiple times would also want to explicitly specify the path attribute to avoid collisions:

unit "service" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/service?ref=${local.version}"
    path   = "service"
}

unit "service2" {
    source = "github.com/gruntwork-io/terragrunt-stacks//stacks/mock/service?ref=${local.version}"
    path   = "service2"
}

This would result in two directories being created in the .terragrunt-stack directory, service and service2, each with an identical terragrunt.hcl file, but with different remote state files.

It's also useful to control the path for Units due to the fact that dependencies are often referenced by relative path.

A terragrunt.hcl file with the following content:

dependency "a_dependency" {
  config_path = "../my-dependency"
}

Would require that a file exist in a sibling directory named my-dependency in order for it to resolve correctly. By leveraging the path attribute, users can ensure that the dependency is named my-dependency regardless of what it's called in the source repository.

Conclusion

Hopefully this clarifies some aspects of the RFC that aren't clear, and potentially encourages some dialogue regarding alternate paths. Feedback welcome!

@ustuehler
Copy link

It seems the proposed Stack concept is designed to help with breaking up monolithic configuration repositories. Could Stacks also be another chance to help shrink or reduce the number of terragrunt.hcl files, as in reducing boilerplate as in #303/#1566 with nested includes, #814 with inherited variables, or #691 with multiple paths for a single include?

@chkp-sergeyl
Copy link

chkp-sergeyl commented Aug 10, 2024

Great RFC, thank you! I understand this will take a lot to make it happen, but I hope this gets real traction soon!

Nested Stacks can result in significantly complicated dependency graphs - I think this will be quite a factor in deciding whether to use the feature. BFS or DFS? Explicit ordering?

@yhakbar
Copy link
Collaborator Author

yhakbar commented Aug 12, 2024

@ustuehler

Similar to those proposals, it does aim to reduce boilerplate in Terragrunt usage by reducing the number of files that have to exist in a repository, yes. It does not aim to address those issues, though. There are many potential ways to address Terragrunt code duplication. Hopefully, this is a very convenient way to address those problems without introducing surprising or difficult to maintain behavior.

@chkp-sergeyl
Thanks! The current plan is to have the ordering be the same as that used by run-all. Units are grouped together to run in parallel, with dependencies being parsed to determine if some units have to wait before they can go. Does that make sense?

@odgrim odgrim assigned odgrim and unassigned odgrim Aug 14, 2024
@yhakbar
Copy link
Collaborator Author

yhakbar commented Aug 19, 2024

Per feedback in multiple forums, a practical example has been requested, so I've created the following:
https://github.com/yhakbar/terragrunt-3313-stacks-walkthrough

If you would like to have a chance to try out using Terragrunt in a context where Stacks are useful, and see what terragrunt.stack.hcl files would do using a simple prototype written in bash, check it out!

Some initial feedback has already been collected in the form of some issues created on the repository. If you have any more feedback on the walkthrough, feel free to create an issue or submit a pull request there.

As always general feedback on the RFC is encouraged here.

@diegoaguilar
Copy link

👍🏼 👌🏼 I root for this, it boosts many DRY use cases for IaC

@kief
Copy link

kief commented Sep 4, 2024

How does the community feel about the terminology used here?

So, aside from Terraform, both CloudFormation/CDK and Pulumi use "stacks" to refer to separately deployable projects. It's always bugged me that Terraform doesn't have a term for this. It's a project when it comes to code, corresponds to a statefile, but there's nothing to call it, and no concept of this level as a reusable, distributable component. It bugs me even more that Terraform are using the term to mean a collection of multiple deployable things, but still not something that is reusable and distributable.

I tend to use the term "stack" closer to CF and Pulumi (in books and talks and things), and would vote for using it consistently! I've started using the term "Infrastructure Product" to refer to a higher level component, e.g. a collection of stacks.

Is the Terragrunt stack a collection of multiple deployable things, or a single deployable thing? How does it related to services (as in this article? Is it reusable and distributable?

@lorengordon
Copy link
Contributor

lorengordon commented Sep 5, 2024

I largely think the term "stacks" is confusing also. I've long equated terraform state to a "stack" to follow the terminology from CloudFormation. When you deploy a CloudFormation template, you create a CloudFormation stack. You can deploy the template many times to create many separate stacks. When you deploy a Terraform module, you create a Terraform state. You can deploy the module many times to create many separate states. A CloudFormation template is like a Terraform module, and a CloudFormation stack is like a Terraform state.

Other than the terminology, the idea is very intriguing.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Sep 5, 2024

To address both points:

@kief
A Terragrunt stack is both something that can be deployed, and a deployable thing. The design is for them to be recursive, like OpenTofu/Terraform modules. You can create a single instance, or you can reference the definition elsewhere, and generate it one or more times.

You can read a more detailed breakdown of what this might look like in practice here:
https://github.com/yhakbar/terragrunt-3313-stacks-walkthrough/tree/main/walkthrough/04-stacks/03-recursive-stacks

@lorengordon
What we've been calling that thing you're referring to internally at Gruntwork is a "Unit". You can read more about it in the RFC under the heading Quick Detour on "Units". The idea is that it's an atomic unit of infrastructure that you can work with independent of all your other infrastructure with its own state.

A Stack in Terragrunt terminology differs from a Unit in that it's a set of related pieces of infrastructure, each with their own state that pass messages between each other via dependency and inputs.

@lorengordon
Copy link
Contributor

Maybe... I know, to continue the CloudFormation analogy, this would be a StackSet... :D

@yhakbar
Copy link
Collaborator Author

yhakbar commented Sep 6, 2024

For the folks that have been following along:

I've introduced a new chapter to the walkthrough I shared earlier detailing how a user would go about adopting Stacks for a use-case involving real AWS infrastructure:
https://github.com/yhakbar/terragrunt-3313-stacks-walkthrough/tree/main/walkthrough/05-aws

Please share any feedback you have here, especially if you think elements of the design in this RFC might make it easier/harder to adopt Stacks for infrastructure you are provisioning today.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Sep 19, 2024

Per feedback I've received externally, I'm including a note that, at a future date, something like the following will be made available for Stacks to be used more conveniently:

# terragrunt.stack.hcl
unit "foo" {
  source = "../units/foo"
  path   = "services/foo"

  inputs = {
    name = "your-name"
  }
}
# .terragrunt-stack/services/foo/terragrunt.hcl
inputs = {
  name = unit.inputs.name
}

And similarly, a mechanism like the following will be available for recursive stacks:

# terragrunt.stack.hcl
stack "foo" {
  source = "../stacks/foo"
  path   = "services/foo"

  inputs = {
    name = "your-name"
  }
}
# .terragrunt-stack/services/foo/terragrunt.stack.hcl
unit "bar" {
  source = ".../../units/bar"
  path   = "sub-service/bar"

  inputs = {
    name = stack.inputs.name
  }
}

The behavior we land on will be documented as something to be delivered in a future release, after Stacks are made available.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Sep 19, 2024

With that, we've received enough feedback to officially accept the RFC for Stacks! 🎉

I'll mark the RFC as accepted, and we'll start scheduling the work to make Stacks available.

@yhakbar yhakbar added accepted Accepted RFC and removed pending-decision Pending decision from maintainers labels Sep 19, 2024
@diegoaguilar
Copy link

👏🏼 I will happily be a test user for this.

Any public way to know about ongoing development for Stacks feature?

@yhakbar
Copy link
Collaborator Author

yhakbar commented Sep 24, 2024

@diegoaguilar , I'm happy to hear the enthusiasm!

Of course, all ongoing work for Stacks will be public, as this is an open source project. The official blog post announcement announcing the release of Stacks for folks that aren't following GitHub will be the point in which we'll close this issue as completed, with a link to that blog post for more information.

In the meantime, I've added a new label for issues stack, which can be used to tag any PRs created to add functionality related to Stacks (on a best effort basis) or bug reports related to them.

As we get closer to the announcement, we'll also have an issue template like this created for Stacks so that folks can report issues related to stacks with the label automatically applied.

Following this issue will be the best way to get those updates as time goes on, as anything that references #3313 will link itself here.

@EdanBrooke
Copy link

I wonder what Terragrunt's value-adds will be now that Terraform and OpenTofu are both trying to address the DRY problem natively.

We're using Terragrunt right now and the changes in 1.0 look promising, but I feel a lot of uncertainty due to the velocity at which all these projects are moving.

It probably goes without saying, but for us the ideal is an intuitive workflow that can grow with us without having to juggle.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Nov 11, 2024

Hey @EdanBrooke !

We've been talking about the best way to communicate the changes that are happening to Terragrunt, and the overall IaC landscape. Your feedback is really appreciated, and welcome!

If I could give a quick, personal answer to this feedback, the main value-add that Terragrunt provides today is that of an IaC orchestrator, not just as a tool to make OpenTofu/Terraform more DRY.

You might have noticed that all of the features described in the road to 1.0 blog post centered around this value proposition of IaC orchestration.

I personally look at it like the difference between containers and Kubernetes. Sure, Docker might make it more convenient to spin up containers in a reliable way, with dynamic configurations so that each container can do more, but Kubernetes is solving the problem of orchestrating those containers at a higher level, so that you can do more than you could with just Docker.

Similarly, OpenTofu is really good at getting infrastructure represented in one state to different infrastructure represented in another state, but if you want to segment that infrastructure into multiple state files to limit blast radius, have updates to those units of infrastructure coordinated correctly, and dynamically, Terragrunt is a really good tool for making that feasible.

What we're trying to deliver with Stacks is very explicitly an experience that won't force you to change everything you are currently doing with Terragrunt, but rather an experience that lets you opt-in to the parts you like, for the infrastructure you're ready to change and no new terragrunt.hcl configurations on release.

If you have the time, and you haven't done it yet, I recommend going through this walkthrough and seeing how and why Stacks was designed as it was in this proposal.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Dec 10, 2024

Something I've asked @denis256 to evaluate is a modification to the Stacks design to introduce a new optional attribute for unit and stack configuration blocks named hidden. It would look like this:

unit {
  source = "../units/foo"
  path   = "foo"
  hidden = false
}

stack {
  source = "../stacks/bar"
  path   = "bar"
  hidden = false
}

The thought behind this is to make it so that users can optionally decide not to generate units and stacks within the hidden .terragrunt-stack directory, and instead generate them immediately next to the terragrunt.stack.hcl file.

This might make it easier for folks to have a "soft" adoption of Stacks, as existing implicit stacks (a directory with a bunch of units) won't be placed into .terragrunt-stack directories, and this would make it so that no state migration would have to take place.

It didn't seem like a heavy lift, so we're asking for feedback on that as an adjustment to the RFC, and we'll look to add that capability after the first pass of implementation is complete.

I've also added it as part of the agenda for tomorrow's Office Hours, so please attend if you would like to share your feedback live!

@nuryupin-kr
Copy link

Hey guys, is there any rough time estimates for when stacks will be available?

@yhakbar
Copy link
Collaborator Author

yhakbar commented Dec 18, 2024

Hey @nuryupin-kr ,

Yup! We're shooting for Q1 2025.

We'll be cutting publicly available pre-releases, etc as we make progress, so you'll be able to give feedback as we make progress.

This is one of our two top priorities for Q1, including the CLI redesign.

Of course, if feedback makes us feel like we need more time to flesh things out or stabilize things, it might take longer.

We're more interested in incremental progress and getting it right than getting it done quickly.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Dec 20, 2024

We just published a blog post providing a deep dive on stacks!

https://blog.gruntwork.io/the-road-to-terragrunt-1-0-stacks-cd97f11ef565

Share your feedback if it gives you any new insights as to how it will work. There's also a special feature Dynamic Stacks that we're committing to above and beyond the contents of this RFC to give users even more tools in their tool belt for managing stacks.

@denis256
Copy link
Member

Published alpha release with support of terragrunt stack generate

stack-example

https://github.com/gruntwork-io/terragrunt/releases/tag/v0.71.2-alpha2024122001

@FuNK3Y
Copy link

FuNK3Y commented Dec 21, 2024

Pardon me if I am redundant or off topic. I tried to read what I could find but found no explicit answer about the deletion (removing a unit in a stack file).

If it behaves exactly like a run-all in the .terragrunt-stack folder the unit resources shall remain unless explicitly removed (terragrunt destroy). While this is backward compatible, it is a deviation from the other levels (removing a resource in a .tf will trigger a removal of the resource implicitly.

IMHO the more consistent approach would be to remove resources deployed in a unit that is no longer present in a stack file, but in order to achieve that you might need to introduce a stack state. Is it something planned down the line ?

@yhakbar
Copy link
Collaborator Author

yhakbar commented Dec 21, 2024

Hey @FuNK3Y ,

Great question! The short answer is yes!

The state that we're going to be using for automatically tracking changes is going to be git, and the mechanism is something I'll be publishing an RFC for after this RFC and #3445 are done.

As you pointed out, it's a larger feature than just supporting stacks, so it's easier to discuss it once stacks are available and the CLI redesign is complete (so that we don't have to adjust the names of flags we introduce right after introducing them).

@LeszekBlazewski
Copy link

I just wanted to comment that I have spent good 2 hours diving into all of the discussion, links, examples and walkthroughs and I must say ... I am psyched for the stacks! As a long time terragrunt user that manages a multi account and env setup on top of AWS, the amount of terragrunt.hcl files I have currently is enormous. Stacks look like a great orchestration abstraction that given some though on the initial design will make bootstrapping new infrastructure way easier and pleasant (especially thinking here about provisioning a new environment that consists of multiple small terraform modules that are glued together via inputs / dependency blocks) - one no longer has to built terraform modules that wrap another terraform modules to limit the amount of terragrunt.hcl files.

Big shoutout to @yhakbar for providing this walkthrough https://github.com/yhakbar/terragrunt-3313-stacks-walkthrough/tree/main as for me it cleared up any misconceptions and shown the real power of stacks.

@denis256
Copy link
Member

denis256 commented Jan 9, 2025

Hi,
published release https://github.com/gruntwork-io/terragrunt/releases/tag/v0.71.3
with terragrunt stack generate experiment

Simplified example:

tg-stack-experiment-demo

@rreilly-edr
Copy link

I would second @LeszekBlazewski I can't wait for this functionality, this is awesome.

@mwos-sl
Copy link

mwos-sl commented Jan 10, 2025

I haven't checked whether it works like this or not, but I think it is quite important that path for source in unit can be interpolated from locals.

So basically we can do something like this in units/vpc/terragrunt.hcl:

...
terraform {
	source = "git::[email protected]:my-org/infrastructure-modules.git//vpc?ref=${local.vpc_module_version}
}
...

And then local.vpc_module_version could be assigned in environment.hcl in "live" part, separately for each environment.
Why? To allow gradual rollout of a new module implementation.

Currently the walkthrough gives examples only for local modules and using relative paths, without any versioning between dev/prod environments.

I bet this is already supported, but pasting just in case, as it would be quite critical feature for us.

@yhakbar
Copy link
Collaborator Author

yhakbar commented Jan 10, 2025

@mwos-sl , you can start trying it out today!

https://terragrunt.gruntwork.io/docs/reference/experiments/#stacks

You can see an example of how versions can be pinned in our test fixtures:
https://github.com/gruntwork-io/terragrunt/blob/main/test/fixtures/stacks/remote/terragrunt.stack.hcl

@j2udev
Copy link

j2udev commented Jan 13, 2025

I'm trying to take this for a quick spin using some of our existing setup and following the video here for referencing local units:
https://github.com/gruntwork-io/terragrunt/releases/tag/v0.71.2-alpha2024122001

I'm hitting errors when attempting to use the stack generate command terragrunt --experiment stacks stack generate

here is my stack file:

unit "vpc" {
  source = "units/vpc"
  path   = "vpc"
}

unit "rke2" {
  source = "units/rke2"
  path   = "rke2"
}

I know these units are fine as we use them all the time in run-all commands, but I'm getting this error:

16:09:25.561 ERROR  error downloading 'file:///workspaces/terragrunt/my-env/dev/units/vpc': write ./.terragrunt-stack/vpc/.terragrunt-cache/e0zqLHE7XgSq3YnuKJcMXvkjTX0/rLDNduzmcT3lplh503Th9HcsUyk/.terraform/providers/registry.terraform.io/hashicorp/aws/5.61.0/linux_arm64: copy_file_range: is a directory
16:09:25.561 ERROR  Unable to determine underlying exit code, so Terragrunt will exit with error code 1

My units have lots of include statements (including for providers) so I'm wondering if there is an issue with that.

At the risk of introducing extra (potentially irrelevant) noise... here is an example of the include statements in the vpc unit:

include "backend" { path = find_in_parent_folders("backend.hcl") }
include "aws_provider" { path = "${get_path_to_repo_root()}/common/aws_provider.hcl" }

@yhakbar
Copy link
Collaborator Author

yhakbar commented Jan 13, 2025

Thanks for trying it out, @j2udev !

Would it be possible to try to create a fixture to reproduce the issue that can be shared with us? Something that allows us to reproduce the issue would go a great way towards troubleshooting the rot cause.

@j2udev
Copy link

j2udev commented Jan 13, 2025

Thanks for trying it out, @j2udev !

Would it be possible to try to create a fixture to reproduce the issue that can be shared with us? Something that allows us to reproduce the issue would go a great way towards troubleshooting the rot cause.

No problem, appreciate all the hard work! We love Terragrunt :) I'll see if I can put something together after work, but it shouldn't be hard to replicate. Have a separate file for generating something... a backend, some provider, w/e and just include it in a unit that you reference in a stackfile. All of this was local in the same repo fwiw.

Anything I should know about the specifics of what a "fixture" is? Would an example repo satisfy that request or are you talking about forking Terragrunt and adding something here:
https://github.com/gruntwork-io/terragrunt/tree/main/test/fixtures

@yhakbar
Copy link
Collaborator Author

yhakbar commented Jan 13, 2025

Either works! The end objective is that maintainers are able to generate the exact same error you encountered so that we can test our fix against it. If it's something we want to continuously test against, we might end up adding to that test/fixtures directory on main anyways, and add an integration test to confirm we don't introduce any regression on that.

@mycodeself
Copy link

Hello! this looks like a nice feature! I'm testing this out but I have some doubts.

Disclaimer: I'm quite new tro terragrunt, so sorry in advance if I said something incorrect...

I'm wondering if this feature isn't the same as moving to a parent folder containing the stack and performing a run-all operation?

At least for our folder structure is like this... Another thing that I think will be a game changer, is when it supports overwriting inputs directly in the stack definition. Is there a ETA for this? is in the roadmap?

thanks in advance for all the great work!

@yhakbar
Copy link
Collaborator Author

yhakbar commented Jan 13, 2025

Hey @mycodeself ,

Welcome to the community! I hope you've read the Getting Started Guide and the documentation we have on getting integrated into our community.

From the perspective of performing Terragrunt runs, it very much is the same, by design. We've worked very hard to design stacks so that users with existing CI/CD systems, scripts, etc can continue to use Terragrunt as they were before regardless of whether they build any tooling to use the new features being introduced with the terragrunt.stack.hcl file.

Using a terragrunt.stack.hcl file only does the following:

  • Reduces the number of files you have to track in your repository .
  • Introduces a new mechanism for versioning unit and stack configurations fetched from remote locations.
  • Introduces a new way to specify values for units within the terragrunt.stack.hcl file.

Even when fully delivered, and there's a dedicated terragrunt stack run command for running updates in a stack without using run-all, you'll still be able to work with the units generated by a terragrunt.stack.hcl file as you were before the introduction of the terragrunt.stack.hcl file (including terragrunt plan/apply directly in them as if the stack didn't exist).

Yup! Stack values are in the roadmap, and we're aiming to have it delivered with the rest of the stacks RFC in Q1 2025.

@j2udev
Copy link

j2udev commented Jan 13, 2025

Either works! The end objective is that maintainers are able to generate the exact same error you encountered so that we can test our fix against it. If it's something we want to continuously test against, we might end up adding to that test/fixtures directory on main anyways, and add an integration test to confirm we don't introduce any regression on that.

I think I figured out what is causing this... I had just thrown my existing units (that have previously been initialized / applied) into a units folder and referenced them... which seems to have something in the .terragrunt-cache folder that was causing this error for the terragrunt stack generate command. When I clear out the .terragrunt-cache folder from the unit it starts working. If you do some lighter weight operation like terragrunt validate-inputs on the unit, which still creates the .terragrunt-cache folder, the terragrunt stack generate command has no issue. When you do some operation that initializes the unit, that seems to add something to the cache that causes the stack command to throw this error. I've made a sample repo that explains it here:

https://github.com/j2udev/terragrunt-stack-example

TLDR: The functionality here is fine, although I could see others hit this same issue. The error message could possibly be made a little better, but no real gripes.

I'm excited to try the stack run command when it's made available :)

@denis256 denis256 mentioned this issue Jan 13, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted RFC rfc Request For Comments
Projects
None yet
Development

No branches or pull requests