Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf-controller for IaC #1140

Closed
ntwkninja opened this issue Dec 20, 2022 · 4 comments
Closed

tf-controller for IaC #1140

ntwkninja opened this issue Dec 20, 2022 · 4 comments

Comments

@ntwkninja
Copy link
Contributor

ntwkninja commented Dec 20, 2022

Problem:
Terraform is scoped to only provide a mechanism to provision / update IaC but inherently relies on 3rd party tooling / additional implementation details for GitOps, state management and drift detection. Zarf has the potential to provide an opinionated way to use terraform in a way that addresses these concerns.

Potential Solution:
Weaveworks has a tf-controller which 1.) maintains state within k8s by default (read discoverability for BB), 2.) has a GitOps workflow for terraform and 3.) leverages the powers of k8s for drift detection / correction. I propose we look at this or comparable tooling that maintains state in k8s by default and we can do state backup to a more robust storage mechanism.

If we go with the tf-controller, I expect it would behave similar to the bb helm chart issue which leverages the flux=true flag.

Alternatives:
External workflows / tooling (pipelines, processes or tooling like atlantis, crossplane, etc.) that compensate for native terraform deficiencies.

Here is an example of using zarf in it's current state to gitops IaC with limitations on k8s discovery of terraform provisioned resources and no drift detection.

parent issue #939

@mikevanhemert
Copy link

mikevanhemert commented Dec 21, 2022

I am new to the experimental Weaveworks tf-controller and recognize most of my questions are coming out of ignorance. What are the chicken-and-egg considerations if this became the recommended path forward for managing IaC with Zarf?

How would Zarf provision an EKS cluster with Terraform and maintain its state if the tf-controller is used? Is the assumption that a k3s or similar 'jumpbox' cluster needs to be used? If this is the case, could the Terraform state for EKS be migrated into EKS so that the external cluster doesn't need to hang around?

I am interested in an alternative where Terraform state is managed locally and then persisted in / retrieved from a Zarf-managed cluster. Assuming Zarf afforded a generic key:value store, I think we could load Terraform state and even provide for locking / unlocking on subsequent IaC component deploys.

Assuming a Zarf package with these components:

components:
  - name: download-dependencies
  - name: terraform
      scripts:
        before:
          # terraform init
          # read terraform state (if it exists in Zarf's `key:value` store
          # terraform apply
        after:
          # save terraform state in Zarf's `key:value` store
  - name: init
  - name: flux
  - name: big-bang
  - name: mission-app

The workflow with my suggested key:value store, assuming no infrastructure exists, following a zarf package deploy:

  1. provision resources in AWS, including EKS, save terraform state in the cluster
  2. zarf init!
  3. Install flux
  4. Install Big Bang
  5. Install that mission app

The same workflow assuming an innocuous change to the underlying Terraform (new security group entry, for instance)

  1. read terraform state, apply the change, save the updated state in the cluster
  2. skip zarf init because there's magic logic this example isn't going into
  3. helm install flux (already there, no changes)
  4. flux install big-bang components (no changes)
  5. flux install the mission app (no changes)

The highlight here is by opening Zarf up to manage a key:value store I believe we can successfully store and manage terraform state and other component-to-component data sharing. I added related but unique thoughts here as well #1137

@ntwkninja
Copy link
Contributor Author

I'm thinking there would be two separate zarf packages for IaC - bootstrap and day2 ops. Day2 ops would be continually updated / applied but the only real difference is the use of local state in the bootstrap until the cluster is up and running.

  • A bootstrap package that provisions all the things using a terraform local state file, that then migrates that state to the cluster that was just provisioned using a k8s-backend.
  • and a day 2 ops package that leverages the k8s terraform backend as a KV store

The other consideration is state backup to reliable storage and recovery. We could do one of the following for backups:

  • a k8s job that periodically backups the tf state to something like S3
  • or each time a day 2 ops IaC package is ran, it has an OnSuccess job that runs afterwards to do an AWS S3 cp from the k8s state to S3

Recovery would simply copy the state backup down as local state for the bootstrap package to use. Then following successful apply, backup the state to the same versioned S3 bucket.

Terraform state considerations will exist regardless of if we utilize the tf-controller. Also, I think the tf state could act as a KV store for IaC things but we may still need a zarf KV store for other things.

image

@ntwkninja
Copy link
Contributor Author

I'm thinking there would be two separate zarf packages for IaC - bootstrap and day2 ops. Day2 ops would be continually updated / applied but the only real difference is the use of local state in the bootstrap until the cluster is up and running.

We don't need multiple zarf packages (either with or without the tf-controller). It could be a single package with optional components that we only run when doing the init. Alternatively, we could discuss what a more zarf-native approach might be.

@jeff-mccoy jeff-mccoy added the iac label Jan 7, 2023
@ntwkninja
Copy link
Contributor Author

closing as out of scope

can reopen if something changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants