Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: allow updating non-authoritative Google IAM project policies (restore_policy) #13038

Closed
pdecat opened this issue Mar 24, 2017 · 17 comments

Comments

@pdecat
Copy link
Contributor

pdecat commented Mar 24, 2017

Taking on @paddyforan's suggestion #11737 (comment) to track specific use cases on managing Google IAM project policies in separate issues, I'll take a stab at it.

Use case

Managing some service accounts' project policies using terraform, but let out of band modifications happen on identities not managed by terraform (e.g. real users and other service accounts).

Currently, after such drift happen, terraform plan reports it and terraform apply would wipe it.

I believe providing a mean to update non-authoritative Google IAM project policies that are stored in restore_policy could help.

This could be an new update_non_authoritative argument to google_project_iam_policy, defaulting to false. Kind of like the existing disable_project argument.

Enabling it would update the restore_policy at refresh or apply time (not sure which is best suited as of now).

Implementation

Some of the work seems to have been done in #11737, but making it the nominal behavior which is not desired.

evandbrown@0923629#diff-03ba869bf101626231cef2bb6acd73e6R222

Work-around

Run terraform plan, capture the drift, then patch the terraform state by moving non-authoritative policies that should be ignored from policy_data to restore_policy.

@paddycarver
Copy link
Contributor

Hey @pdecat, I'd love to hear more about this:

Managing some service accounts' project policies using terraform, but let out of band modifications happen on identities not managed by terraform (e.g. real users and other service accounts).

Why would such a thing be necessary? What real-world value/use cases can be delivered by that?

As an example of a compelling use case that goes against the "Terraform manages everything" ideology, Kubernetes allows users to attach disks to instances that pods are scheduled onto, so disks are dynamically allocated at runtime and can't be allocated as a "provision" step. What's something like that that currently isn't possible?

@pdecat
Copy link
Contributor Author

pdecat commented Apr 12, 2017

Hi @paddyforan, thanks for your attention.

My use case here is that I manage the technical part of a Google project for a customer mostly using terraform (that is, except for things not yet supported by terraform), but they still want to be able to grant access to some of their Google resources to other people directly from the Google web console, without soliciting me or using terraform.

An alternative to the change I am proposing here I thought of would be to only use terraform to manage IAM policies on Google resources other than the project and let them manage their out-of-band IAM policies only at the project level.
That would also have the benefit to have finer grain policies on my part.
But nowadays, terraform only supports IAM policies on projects, not other resources. (Note that I tentatively opened #13003 to make that part advance but will have to review my copy ;) )

In the mean time, I use a combination of terraform plan and state commands along with jq manipulation to patch the state's restore_policy. No the funniest thing to do on a daily basis.

PS: having spent more than a year managing AWS IAM policies that way for some customers, it is tough to adapt to Google IAM policies which are set on the resources themselves and not first class resources on their own. Agreed this difference is none of terraform's concerns.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 14, 2017

Just got another use case, still with out of band modifications, but not under our control this time:

A Google service account named [email protected] was recently granted roles on one of our projects, probably due to a recent update on Google's side (I opened a case to understand its purpose).

Right now, I can think of three options:

  1. ignore it, and wipe it during the next terraform apply, with the risk of breaking something,
  2. integrate it to our terraform configuration,
  3. patch the state's restore_policy as described before.

Option 3 is what this issue is all about.

FWIW, here is the bash command I use to display policies diffs in a readable manner, which let me notice those out-of-band changes:

# export TFID=module.global.google_project_iam_policy.mypolicy
# readarray -t array <<< $(jq -c '.bindings |= sort_by(.role) | .bindings[].members |= sort' <(terraform plan -target $TFID | grep '    policy_data: ' | cut -d: -f2- | sed -e 's/^ "//' -e 's/"$//' -e 's/\\"/"/g' -e 's/" => "//')) ; diff -U 3 <(echo ${array[0]} | jq .) <(echo ${array[1]} | jq .)

Edit: got this answer from Google support:

The account you mention is created for internal purposes as part of the white-listing process for container analysis. This is required by either Google Cloud Container Builder or App Engine Flexible Environment, so if you have not signed up for any alpha features nor are you in direct contact with any product teams, then it is safe to remove.

@paddycarver
Copy link
Contributor

My use case here is that I manage the technical part of a Google project for a customer mostly using terraform (that is, except for things not yet supported by terraform), but they still want to be able to grant access to some of their Google resources to other people directly from the Google web console, without soliciting me or using terraform.

How would you imagine Terraform should distinguish between IAM policies set by users in the control panel and IAM policies set by error?

A Google service account named [email protected] was recently granted roles on one of our projects, probably due to a recent update on Google's side (I opened a case to understand its purpose).

This is actually what the issue is about, in my mind. Something changed, you don't know why. Your infrastructure is in an undefined state right now. Terraform wants to define your infrastructure's state. It wants you to make a call about what should and should not be there whenever possible. Both 1 and 2 are valid options, because if it's not necessary to your infrastructure, it should be wiped out. If it is necessary to your infrastructure, it should codified, versioned, and persisted as part of your config, or future versions of your infrastructure aren't guaranteed to have it. 3 isn't a desirable choice, as far as I'm concerned, just because it's largely just saying "my configs aren't authoritative anymore". Which defeats the entire purpose of the configs.

@danawillow and I will be discussing this next Monday at length, and the idea of managing each rule individually and ignoring rules that aren't managed is something I'd like to investigate, but I can't make promises about its feasibility or if it's the right answer just yet. It's just something we're looking into.

That being said, as many use cases as we can gather in the next week definitely helps.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 25, 2017

How would you imagine Terraform should distinguish between IAM policies set by users in the control panel and IAM policies set by error?

In my mind, this feature request only applies when executing the terraform plan and terraform apply commands interactively, passing an additional argument to trigger the requested update of restore_policy. Automated scenarios are out of scope.

So it wouldn't be terraform's job to guess if those changes are normal or not but mine as the person executing the commands.

@paddycarver
Copy link
Contributor

If you're manually reviewing the changes anyways, what does just manually updating your configuration file with the changes leave to be desired?

@pdecat
Copy link
Contributor Author

pdecat commented Apr 27, 2017

That's an option I've discussed with the customer in that particular case.
For once, we may not want to manage those unrelated permissions in configuration files as they add clutter.
But more importantly, if I destroy the terraform managed stack, it will remove all those permissions, probably leaving the customer unhappy if that happens.

@paddycarver
Copy link
Contributor

It sounds, and please do correct me if I'm wrong, like you want to whitelist policies to control, not list policies to ignore. Is that an accurate summary?

If so, I may have an idea. I'll have to chat with @danawillow on Monday, and it would likely need to wait for a major release, but it may meet your use cases without working against Terraform.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 27, 2017

Yeah, that's indeed a more accurate description of this use case.

I was just trying to fit it in the current implementation based on restore_policy.

@paddycarver
Copy link
Contributor

Ah, thanks! Good to know. That explains why it felt like we were at such an impasse--I'm trying to get away from restore_policy-esque behaviour, because it works against Terraform a bit, and is going to make things harder and more prone to bugs in the future. So my whole approach here has been "how do we get this use case supported without pushing things further in that direction?"

Glad we've got that sorted. I have ideas on this, and hopefully sometime next week we'll have a proposal for this. We just want to avoid churn and instability, so we're setting some time aside to research and think through all the implications, so any breaking changes we need to make are at least thought-through.

Thanks for the patience and all the help in understanding what the issue is really about.

@leighmhart
Copy link

Hi all,

I'm not sure if this is the right place to add a related use case, if not please let me know and I'll move it to a new issue if required.

We have recently started upgrading our projects to a new GCP terraform template which makes use of the google_project_iam_policy along with google_project in the new format. For new projects, we are using google_project to create, for existing projects we are importing the google_project object and copying in our project.tf for google_project so it matches.

In both cases, we are doing this in modules - e.g.:

# Creates and maintains Google Project

module "project" {
  source    = "git::ssh://[email protected]/GCP-MODULES.git//project//"

  # Set module options
  name            = "${var.gcp-project-name}"
  id              = "${var.gcp-project}"
  organization    = "${var.gcp-organization}"
  billing_account = "${var.gcp-billing-account}"
}

output "project-number" {
  value = "${module.project.number}"
}

and

module "iam-account" {
  source    = "git::ssh://[email protected]/GCP-MODULES.git//iam//"

  # Set module options
  project_id        = "${module.project.id}"

The modules do what you'd expect:

variable "project_id" {}

resource "google_project_iam_policy" "project-policy" {
    project = "${var.project_id}"
    policy_data = "${data.google_iam_policy.policy-bindings.policy_data}"
}

data "google_iam_policy" "policy-bindings" {
    binding {
        role = "roles/compute.networkAdmin"
        members = [
          "group:${var.network_group}"
          ]
    }
    # Security Administrators
    binding {
        role = "roles/compute.securityAdmin"
        members = [
          "group:${var.security_group}"
          ]
    }
}

etc - however, some projects need local service accounts with unique policy bindings - so I add this to the local project's tf:

resource "google_service_account" "cloudendure" {
    account_id = "cloudendure"
    display_name = "Service Account for CloudEndure VM migrations"
}

data "google_iam_policy" "local-policy-bindings" {
  binding {
    role = "roles/editor"
    members = [
      "serviceAccount:${google_service_account.cloudendure.email}",
    ]
  }
}

resource "google_project_iam_policy" "local-policy" {
    project = "${module.project.id}"
    policy_data = "${data.google_iam_policy.local-policy-bindings.policy_data}"
}

Now this works - and because both the module and local tf google_project_iam_policy resource doesn't set the "authoritative" option, apply works, however, terraform plan keeps wanting to apply each of these resources in turn (flip flopping between executing the module version vs. the local version). It doesn't affect the project IAM settings each time it runs, both IAM rules stay in the project console, but terraform isn't happy with the state.

@paddycarver
Copy link
Contributor

At the moment, managing IAM for the same project using modules is a bit of a mess, and not recommended. We have a design that we think will resolve this issue in a more elegant and better-behaved way, and we're working on making that happen. :) Sorry for the pain in the meantime.

@leighmhart
Copy link

leighmhart commented May 30, 2017

update: actually, after a bit of further testing, the above issue IS impacting the project IAM settings - the module based policy keeps removing the local policy bindings and then, once that happens, the local one starts removing the module ones and the project gets into a horrible state.

terraform destroying both and re-applying the module based one and then the local one fixes it again, temporarilly, until the next apply needs to happen on the project. Boo.

I understand it is a bit of a mess, but I have no choice - we have 60+ projects that have to share the same global IAM policy bindings for their local admin accounts - if I don't use modules for this I have to update 60+ projects' worth of terraform code every time I add one new feature to the list that everyone should have access to.

I also need to do local per account service account bindings.

How can I reconcile this?

Terraform needs to do what it says it does which is to not "overwrite any existing IAM policy on the project." if authoritative is set to false.

The whole "restore_policy" thing simply makes a mess of things.

I don't want to / can't go back to 0.8.2 where none of this was broken (although it was a bit ugly) ;-)

@paddycarver
Copy link
Contributor

Hi @allandrick! Sorry for the pain. I'm working on a solution to this that should allow you to manage policies at a much finer grain, rather than managing all of them or none of them. I unfortunately don't have a timeline for it yet, but it's a top-of-mind concern for me.

Terraform needs to do what it says it does which is to not "overwrite any existing IAM policy on the project." if authoritative is set to false.

Agreed, or we at least need to call out more clearly that using modules breaks this.

The whole "restore_policy" thing simply makes a mess of things.

Ideally, we will not have a restore_policy in the future.

Though I can't guarantee any of these details, the design I have in mind looks something like this:

  • google_project_iam_policy has a binding field that manages just a single binding. When this is used, Terraform manages only that specific binding, and leaves all other bindings alone. The policy_data field is deprecated/goes away.
  • We add a google_project_iam_policies resource with a policy_data field that functions much like google_project_iam_policy. The difference is it makes no attempt at guard rails, removing the restore_policy behaviour and authoritative. It overwrites all IAM policies. You should not have more than one of these defined for a single GCP project.

I'm still a bit of work away from shipping that, but that's the general idea in my head, at the moment. As I implement it and get feedback, details may change.

Sorry again for the issues, and know we definitely want to get this fixed. Thank you for your patience.

@leighmhart
Copy link

@paddycarver thanks for the update - something else worth considering and unrelated to the fact I am using modules, the resource google_project_iam_policies is also getting confused when services are enabled on the project which create Google Service API accounts (e.g.: Container Registry, etc). Terraform plan wants to delete them! I have had to add a lifecycle block with ignore_changes = ["*"] in all my google_project_iam_policies resources to prevent terraform from blowing away auto-generated IAM accounts by services.

@pdecat
Copy link
Contributor Author

pdecat commented Jun 8, 2017

Hi, until this is implemented, here's a way easier work-around than what I initially proposed (i.e. updating the restore_policy by hand):

# terraform state rm google_project_iam_policy.myproject
# terraform apply

@ghost
Copy link

ghost commented Apr 9, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants