Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using kubernetes_manifest when the kubernetes cluster is not yet available #1625

Closed
mbyio opened this issue Mar 3, 2022 · 5 comments
Closed

Comments

@mbyio
Copy link

mbyio commented Mar 3, 2022

Description

We need to use kubernetes_manifest in order to define CRDs and other things not yet supported by the Terraform Kubernetes provider. Without that functionality, it is essentially useless for us to try and use Terraform to manage Kubernetes resources.

Currently, because kubernetes_manifest uses server side apply, the Kubernetes cluster has to be online at planning time. We would like to be able to define the Kubernetes cluster and CRDs needed to bootstrap the cluster in the same Terraform state. That way, when we need to experiment with something, we can launch all related resources in one go.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@mbyio
Copy link
Author

mbyio commented Mar 3, 2022

Nevermind, this is really just another problem caused by Terraform's inability to handle a provider depending on attributes of resources defined by another provider. Latest update on the issue is here hashicorp/terraform#2430 (comment). Currently, it is required to put it in a separate state from the cluster.

@mbyio mbyio closed this as completed Mar 3, 2022
@apparentlymart
Copy link
Contributor

FWIW, I think there is a hypothetical design here where kubernetes_manifest could fall back to a very minimal planning mode when the cluster is unavailable, such as just checking that manifest has been set to any object type (a requirement regardless of remote configuration), or perhaps checking one level deeper for kind, metadata, and data all having suitable data types even though the content isn't checkable.

I don't know enough about this provider or Kubernetes to speak to how feasible that would be, but if it is technically possible then I think it's interesting to think about whether that behavior would be useful.

It would in principle allow creating a cluster and pushing manifests to it all in one step, but if any of the manifests are invalid then the terraform apply step would fail partway through. On the plus side, after correcting the error and running terraform apply again Terraform ought to be able to pick up where it left off and retry just the operation that failed, and by now the cluster will be present and so it should be able to give better feedback about the validity all of the downstreams too. So I suppose we could say that at best terraform apply gets everything done in one action, and at worst it takes two iterations because the first run detects a manifest error during apply, but then any subsequent plans can check everything else.

Since the current design requires two terraform apply steps in all cases, this seems like an improvement in principle. I suppose this compromise could be problematic if the first terraform apply gets far enough to break something but then an error blocks whatever action would've made it work again downstream, though there are always dynamic errors that can cause the apply step to fail at arbitrary points. I assume this "problem" would also apply to any other technology used to systematically write multiple interrelated manifests into a Kubernetes cluster all at once, so not something unique to Terraform.

Again I don't really know enough to make this tradeoff, but I just wanted to note the possibility; I suspect the maintainers of this provider already considered such an approach and encountered downsides I'm not considering.

@alexsomesan
Copy link
Member

@apparentlymart, as we've discussed on a few occasions in the past, the main reason why the kubernetes_manifest provider needs to retrieve schema information from the cluster at runtime is that it needs to generate the COMPLETE structure of the cty / tftypes Object that holds the resource attributes AHEAD of the first apply operation. As I'm sure it is already obvious, Terraform checks for and does not tolerate changes to the types of attributes of a resource once state has been persisted.

Per your suggestion, should the provider fall back to accepting the Object value of the manifest as it was supplied by the user and return that as the plan output, the followup apply might actually add new Object attributes (defaults returned by the API for unset values) thus effectively changing the type of an object attribute between plan and apply (Objects with different sets of attributes are considered different types).

Terraform does not tolerate this and satisfying this requirement is what drove most of the complexity in the manifest provider and led us to build the OpenAPI to tftypes mechanism.

Falling back like you suggested would result in an even more brittle user experience, littered with "Provider produced an incosistent state" errors. We've already tried it early in the development of the manifest provider.

@apparentlymart
Copy link
Contributor

apparentlymart commented Mar 4, 2022

I would expect it to work if the provider responded to PlanResourceChange by returning manifest as an unknown value of an unknown type -- in cty's terms, cty.DynamicVal -- but since this is the first provider that's tried to be this dynamic I don't know if there are gotchas there in practice. If so, it would be good to capture them in the issue over in the Terraform repository (the one linked above) and then we could consider fixing that as a possible path forward here, if you think that making this resource type effectively disabled all of its checks in the unknown-config situation would be useful. (That was the question I was intending to ask here; would it be desirable, not whether it's currently technically possible.)

@github-actions
Copy link

github-actions bot commented Apr 4, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants