Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate GitHub Actions for container image publishing and Terraform exeuction #76

Closed
hectcastro opened this issue Sep 3, 2020 · 7 comments

Comments

@hectcastro
Copy link
Contributor

Can we safely use GitHub Actions to build container images, publish them to ECR, and run Terraform from within GitHub Actions?

@colekettler
Copy link
Contributor

colekettler commented Sep 14, 2020

@hectcastro I took a pretty deep dive into this. Setting up container builds and publishing is simple enough. Terraform is a lot more complicated.

Unfortunately there's no easy, fire-and-forget solution. It's possible, but all of the potential approaches involve some changes to how we handle deployment, either at the scripting or infrastructural level. Here are the options I've considered, grouped by rough categories:


Repository / Actions Settings

Make build logs private

We evaluated this approach in azavea/operations#425 in the context of CircleCI.

This is not an option for Actions at this time. We can't make only the build logs private while keeping the repository public. As such, this isn't useful as a general approach and I recommend against it.

Terraform Settings

Indicate all sensitive values

We evlauated this approach in azavea/operations#425. There have not been significant developments in this area since.

At this time, this has to happen at the provider level and is not always consistent. There is an open issue to give consumers a way to indicate that variables are sensitive, but that feature is not going to be available within a definite timeframe. We should revisit this later on.

Masking Values

Mask values in output with tfmask

Source: https://github.com/cloudposse/tfmask

Pros:

  • Would only require a small change to our Terraform commands.

Cons:

  • Would require installing an additional dependency into our Terraform image.
  • Relying on a regex for masking introduces high risk for error and unintentionally leaked secrets, especially when introducing new secrets.

Recommendation: No. The potential for human error is too high and we wouldn't have a good way to detect if we've leaked something.

Mask values in output with terrahelp

Source: https://github.com/opencredo/terrahelp/tree/master/examples/mask

Pros:

  • Would only require a small change to our Terraform commands.

Cons:

  • Would require installing an additional dependency into our Terraform image.
  • Shotgun approach. Would remove a large amount of potentially useful debugging information from our log output.

Recommendation: Probably no. This is better than the regex option, but losing that much non-sensitive information from our build logs could make it much more time-consuming and awkward to investigate failed deploys.

Mask values in output with GitHub Actions masking

Source: https://docs.github.com/en/actions/reference/workflow-commands-for-github-actions#masking-a-value-in-log
Pros:

  • All config is attached to the GitHub repository. No general changes needed in STRTA or Terraform.

Cons:

  • There are several gotchas around specifying masks in workflows like this.
  • Would also require us to specify all masked values ahead of time, likely as GitHub Secrets to avoid storing them in cleartext. This is potentially a lot of duplication with respect to the terraform.tfvars file, and we'd need to remember to update both values if we ever roll credentials or else we'll leak the new ones.

Recommendation: No. Creates too much duplication and potential for either human or workflow errors.

Secrets Management

Manage sensitive values with AWS Secrets Manager instead of Terraform

Pros:

  • This is one of the canonical "right" ways to handle secret management. If implemented correctly, it would have an overall positive effect on the secure handling of credentials throughout our infrastructure.
  • Doesn't rely on Terraform for secrets management, because Terraform wasn't built to be a secrets manager.
  • Preserves our existing methods of providing Terraform variables and the structure of our infra script.

Cons:

  • Very large architectural change. Would require full buy-in to using AWS Secrets Manager for loading all secrets in at runtime. Would require application-level and infrastructure-level integration. May require an application-level caching layer to preserve performance.
  • Incurs additional cost of $0.40 per secret stored and $0.05 per 10,000 API calls. Secrets can be grouped together as JSON, but we'd need to consider these groupings carefully.
  • Probably security overkill for many of our projects and likely too much implementation overhead to be relied on as a general approach for all projects.

Recommendation: A begrudging no. I love this idea conceptually, it's a very correct and complete approach, but I don't think it's practical for us in the slightest.

Deployment Architecture

Split deployment and build jobs

Pros:

  • We could use Actions for the test / build / publish process, then invoke a Jenkins or a private CI platform job for deployment.

Cons:

  • Now we maintain two deployment systems for this project: Actions and likely Jenkins.
  • Troubleshooting failed builds may now require a context switch to another CI platform.
  • Split deployment infrastructure, another network hop, another point of failure.

Recommendation: No. I don't think GitHub Actions offers enough distinguishing value to justify using it for the easy parts but still leaning on private CI for the more sensitive parts.

Redirect Terraform log output to S3

Pros:

  • Most changes will be confined to the infra script. Our deployment workflow and architecture does not need to change significantly.
  • AWS credentials are the only credentials we will need to store with GitHub Secrets.
  • Preserves a deploy log in S3.

Cons:

  • Makes workflow output logs considerably less useful when troubleshooting a failed deploy.
  • Requires setting up a new subdirectory within the Terraform config bucket, or a new bucket outright.

Recommendation: Maybe. It's a slight inconvenience, but I think we can make it less annoying by outputting S3 links like we do with ECS tasks in django-ecsmanage. There's room to smooth over the disconnect between workflow logs and Terraform logs.


All told, if I had to suggest one of these, I'd suggest redirecting plan and apply output to S3. This gives us a layer of access control that we're used to working with and allows us to script around some of its inconvenience if we want.

Barring that approach, I'd recommend we stick with private CI until there are significant developments in either Terraform or GitHub Actions to cover this use case.

@colekettler
Copy link
Contributor

I also briefly looked into Terraform Cloud in case it offered any features that would help, but since it largely functions as a remote backend I think we're going to run into the same problems.

@hectcastro
Copy link
Contributor Author

Nice write-up. The in-context pros/cons and follow-up recommendation made absorbing all of the context relatively lightweight.

Some quick comments on the various solutions:

  • I like terrahelp a lot more than tfmask, but agree that it doesn't feel like a justifiable solution.
  • I also dislike GitHub Actions masking; it seems like TF output is out of scope for the use cases it was designed to satisfy.
  • I think Secrets Manager is a good example of a thing that we'd incorporate into the next revision of Breakable Toy, or attempt to encapsulate in a module that's specific for running ECS services "our way". I agree that it is a bit of a heavy lift--especially in the context of this project.
  • I don't like the idea of splitting across CI platforms either.
  • Redirecting Terraform output checks a lot of the boxes, but feels like a better solution for when you're in a pinch, or in a scenario where we've already invested a bunch of effort and don't want to undo it.

In this case, it seems like leaning on Jenkins (probably the RF instance) would be the easiest approach with the least compromise. While CodeBuild would provide private CI too, I don't think it is worth engaging with its distinct quirks (relative to Jenkins).

Are you OK with that outcome?

@colekettler
Copy link
Contributor

Thank you, glad to hear it!

Yeah, using Jenkins sounds like a good outcome to me. I'd rather hold out for upstream changes to make this possible without such significant compromises.

Good to close this one out?

@hectcastro
Copy link
Contributor Author

👍

@hectcastro
Copy link
Contributor Author

Please also open a new issue so that we can actually wire up Jenkins next sprint.

@colekettler
Copy link
Contributor

I created #80 and pulled it into this sprint to keep it ahead of #64. I doubt we'll get to it but that ordering makes the most sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants