google_container_cluster.node_config subfields can't be updated in-place #19225

wyardley · 2024-08-22T18:04:10Z

When defined as part of the default nodepool (via node_config.kubelet_config within a google_container_cluster resource, cpu_manager_policy (and, potentially, other kubelet_config settings) will not get updated properly if changed (note: they should function correctly when a cluster / nodepool are created initially.

This seems likely to be because the code here and here is not in the path at all when the resource is invoked this way.

Note: this may affect other settings within node_config; ~~it's just more urgent with this specific field because it's Required, (though maybe it shouldn't need to be)?~~

Edit: Also, I think there are contexts where cpu_manager_policy can / should be unset? I'm not sure exactly why the provider currently lists it as required to be set when kubelet_config is. It seems like there are cases where it's valid for it to be unset, and in fact, the provider doesn't seem to handle setting it properly in any of the cases / places I tested it. This is now fixed as part of #11572

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

OpenTofu v1.7.3
on darwin_arm64
+ provider registry.opentofu.org/hashicorp/google v5.41.0

Note: should be reproducible with regular Terraform and latest provider version as well.

Affected Resource(s)

google_container_cluster

Terraform Configuration

resource "google_container_cluster" "test_insecure_kubelet_readonly_port" {
  name               = "test-insecure-kubelet-readonly-port"
  location           = "us-central1-f"
  initial_node_count = 1

  node_config {
    kubelet_config {
      # Note: create the cluster first, and then try to update the value to this
      cpu_manager_policy = "static"
    }
  }
  deletion_protection = false
  network             = "default"
  subnetwork          = "default"
}

Debug Output

2024-08-22T11:02:24.603-0700 [WARN]  Provider "provider[\"registry.opentofu.org/hashicorp/google\"]" produced an unexpected new value for google_container_cluster.test_insecure_kubelet_readonly_port, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .node_config[0].kubelet_config[0].cpu_manager_policy: was cty.StringVal("static"), but now cty.StringVal("")

Expected Behavior

When the value is set to "static", Terraform should update the settings on the default node-pool in place to the expected value.
When the value is set to "none" or unset, a permadiff should not be observed when the value is not set at the API level

Actual Behavior

A permadiff is observed.
As best I can tell, Terraform does not actually try to update the node pool settings when the value is changed from default for the default pool.

Note: the undocumented, but allowed, value "" resolves the permadiff, presumably because the API is not returning it at all vs. returning the value "none" with default settings, and because the provider doesn't seem to actually be trying to update the default nodepool settings when the setting is changed after the resource is initially created.

Steps to reproduce

terraform apply

Important Factoids

See references below for more details

References

GoogleCloudPlatform/magic-modules#11272
#15767

b/361634104

The text was updated successfully, but these errors were encountered:

wyardley · 2024-08-27T22:21:54Z

So I think what needs to happen is roughly something like

"when anything in this block changes, call nodePoolUpdate() (https://github.com/GoogleCloudPlatform/magic-modules/blob/61f619ede9c1e30516810a5d6799fbfe29cd91da/mmv1/third_party/terraform/services/container/resource_container_node_pool.go.erb#L1349C6-L1349C20) for default-pool. After the operation is complete, it may be sufficient (at least for some of the attributes) to infer that it was successful via the deprecated values directly in nodeConfig.nodeKubeletConfig, but I'm guessing it might be best to actually look at default-pool's settings directly for some things?

It may be necessary to update the cluster for some settings and the nodepool for others.

wyardley · 2024-08-28T04:40:11Z

GoogleCloudPlatform/magic-modules@db731f7

POC fix for insecure_kubelet_readonly_port_enabled only, but probably a broader fix can be made for all the items within node_config.kubelet_config.

wyardley · 2024-08-29T04:09:02Z

GoogleCloudPlatform/magic-modules@efb71a9
Seems like it was sort of arbitrary that this field was marked Required at all

Partial fix for hashicorp/terraform-provider-google#19225 This should resolve some confusing behavior with `cpu_manager_policy` * It frequently will show a permadrift when it can't be set * It also doesn't seem to accept the documented value of "none" as an empty value, though the previously undocumented empty string (`""`) seems to work. This doesn't resolve all of the issues, but resolves other issues where it must be set where it's not actually needed (for example, if `insecure_kubelet_readonly_port_enabled` is set). It appears that it was marked as `Required` somewhat arbitrarily, and it's also possible that some of what's in place is tied to an API level bug that may have since been resolved.

melinath · 2024-08-29T15:29:24Z

Updated the title based on the discussion above and at GoogleCloudPlatform/magic-modules#11272 (comment) - the main thing to fix here is going to be that update-in-place is broken for most subfields of node_config. (Fixing the cpu_manager_policy issue is sort of separate but happy to track it here since there's already a PR open for it.)

wyardley · 2024-08-29T15:32:12Z

Thanks, had the same thought about updating the title to be more clear. When I created this, it wasn't quite as clear how related these issues all were, but it's becoming marginally clearer now.

wyardley · 2024-09-04T16:25:06Z

GoogleCloudPlatform/magic-modules#11553 is another one within node_config that probably needs to be fixed in here now that force replacement was removed.

Partial fix for hashicorp/terraform-provider-google#19225 This should resolve some confusing behavior with `cpu_manager_policy` * It frequently will show a permadrift when it can't be set * It also doesn't seem to accept the documented value of "none" as an empty value, though the previously undocumented empty string (`""`) seems to work. This doesn't resolve all of the issues, but resolves other issues where it must be set where it's not actually needed (for example, if `insecure_kubelet_readonly_port_enabled` is set). It appears that it was marked as `Required` somewhat arbitrarily, and it's also possible that some of what's in place is tied to an API level bug that may have since been resolved.

wyardley · 2024-09-12T21:45:15Z

@melinath @rileykarson may be best for someone from Google to take this on, but if there's clear guidance about whether (and how) the node pool update stuff can be DRYed up / shared between the node_pool and container_cluster resources, I might be able to take a stab at it.

melinath · 2024-09-12T21:57:01Z

@wyardley I don't think we have specific guidance about it, but you're welcome to take a stab. As long as it works and is easy to understand, there's a good chance we'll accept the change.

Heads up that we're planning to switch the core engine from Ruby to Go in a couple weeks; I think that wouldn't impact this ticket too badly since it's working with handwritten files. The .erb formatting will change to go template formatting, but it might not be horrible to port over? Or you could wait until after the port lands.

wyardley · 2024-09-12T22:01:59Z

I guess the big thing I was looking for ideas on is whether the code can be shared between the two resources (and if so, what files should it be in). And yeah, I can work around the cutover one way or another.

rileykarson · 2024-10-14T17:41:15Z

I also have GoogleCloudPlatform/magic-modules#11978 out which fixes the specific cause of that error (but still doesn't accurately model the API- we send the message we expect, and still have a permadiff).

This sets the stage for consolidating the logic for updating nodepools between the `container_cluster` and `container_node_pool` resources, by creating a new `nodePoolNodeConfigUpdate()` function. Part of hashicorp/terraform-provider-google#19225

This resolves an issue where many subfields of `node_config` in a cluster (which affects the default node-pool "default-pool" when `remove_default_node_pool` is not set to `false`) don't support updates properly. Fixes hashicorp/terraform-provider-google#19225 Followup to GoogleCloudPlatform#11826 where the code used by the regular node pool update code was broken out.

wyardley · 2024-10-16T15:08:47Z

I've added a change in the linked PR that I think will fix this for all remaining cases, and at that point, any additional DRYing up within that shared function could be totally optional and could be handled by anyone who felt like taking it on in the future.

wyardley · 2024-10-18T00:30:26Z

Also just did a quick verification that (even without these changes) when node_pool {} (within google_container_cluster) is used to define the first node pool vs the implicit settings (e.g., node_config), which I just was reminded is also a way this can work 🙃, updates do already seem to work as expected.

This resolves an issue where many subfields of `node_config` in a cluster (which affects the default node-pool "default-pool" when `remove_default_node_pool` is not set to `false`) don't support updates properly. Fixes hashicorp/terraform-provider-google#19225 Followup to GoogleCloudPlatform#11826 where the code used by the regular node pool update code was broken out.

wyardley · 2024-10-19T04:06:59Z

While working on the fixes, I had a thought: at some point (not as part of this set of changes), I imagine it might be possible to build a single nodeConfig object for some or all of the changed fields and do a single update, just logging the top level keys of all the changed fields? If that works, it would help speed updates up significantly, since all the updates would be accomplished in one pass vs. serially, but maybe there's a reason it doesn't already work that way?

wyardley · 2024-10-19T18:18:35Z

For anyone following, one thing I discovered is that labels, resource_manager_tags and workload_metadata_config would be updateable, but only because they were set to force recreation. Presumably the intent was for other non-updateable fields to be added to this list as well, and people just didn't realize they needed to do so when those fields were added to node_config.

Since I'm pretty sure these should be allowed to be updated in-place in the default node pool when defined in node_config, just as they could be when defined externally or in node_pool {}, I simplified the code to remove that bit, and updated the changelog notes to reflect both a fix (of the fields that were missing entirely) and an enhancement (allowing those fields to be updated in place).

This resolves an issue where many subfields of `node_config` in a cluster (which affects the default node-pool "default-pool" when `remove_default_node_pool` is not set to `false`) don't support updates properly. Fixes hashicorp/terraform-provider-google#19225 Followup to GoogleCloudPlatform#11826 where the code used by the regular node pool update code was broken out.

It appears that in-place updates were never implemented for `node_config.containerd_config`, when contained within either `google_container_cluster` or `google_container_node_pool` Fixes hashicorp/terraform-provider-google#19056 Related to hashicorp/terraform-provider-google#19225 Signed-off-by: William Yardley <[email protected]>

github-actions · 2024-11-26T02:17:31Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

wyardley added the bug label Aug 22, 2024

github-actions bot added forward/review In review; remove label to forward service/container labels Aug 22, 2024

This was referenced Aug 22, 2024

container: add support for kubelet read only port GoogleCloudPlatform/magic-modules#11272

Merged

insecureKubeletReadonlyPortEnabled support terraform-google-modules/terraform-google-kubernetes-engine#2013

Closed

trodge removed the forward/review In review; remove label to forward label Aug 22, 2024

modular-magician added the forward/linked label Aug 22, 2024

wyardley changed the title ~~Inconsistent behavior (and inability to set) cpu_manager_policy~~ Inconsistent behavior (and inability to set) cpu_manager_policy on default nodepool Aug 22, 2024

This was referenced Aug 29, 2024

container: make cpu_manager_policy optional in kubelet_config GoogleCloudPlatform/magic-modules#11572

Merged

Support node_kublet_config in node_pool_auto_config so users can disable the kubelet read-only port #19236

Closed

melinath changed the title ~~Inconsistent behavior (and inability to set) cpu_manager_policy on default nodepool~~ google_container_cluster.node_config subfields can't be updated in-place Aug 29, 2024

This was referenced Aug 29, 2024

container_cluster.node_config.kubelet_config.cpu_cfs_quota value is set in cluster request even when no value is set in config #15767

Open

feat: Remove force replacement from gcfs_config GoogleCloudPlatform/magic-modules#11553

Merged

wyardley mentioned this issue Sep 4, 2024

Unnecessary ForceNew On Container Node Pool / Cluster gcfs_config #18417

Open

This comment was marked as outdated.

Sign in to view

wyardley mentioned this issue Oct 16, 2024

container: fix missing updates for google_container_cluster.node_config subfields GoogleCloudPlatform/magic-modules#12014

Merged

wyardley mentioned this issue Oct 21, 2024

sysctl change does not actually get applied #18208

Closed

roaks3 closed this as completed in GoogleCloudPlatform/magic-modules#12014 Oct 25, 2024

This was referenced Oct 25, 2024

container: fix missing updates for google_container_cluster.node_config subfields #20038

Merged

container: fix missing updates for google_container_cluster.node_config subfields hashicorp/terraform-provider-google-beta#8522

Merged

wyardley mentioned this issue Oct 25, 2024

container: fix in-place updates for node_config.containerd_config GoogleCloudPlatform/magic-modules#12135

Merged

wyardley mentioned this issue Oct 28, 2024

node pool node_config update doesn't notice or fail on 400 API response #19935

Open

github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

google_container_cluster.node_config subfields can't be updated in-place #19225

google_container_cluster.node_config subfields can't be updated in-place #19225

wyardley commented Aug 22, 2024 •

edited

Loading

wyardley commented Aug 27, 2024 •

edited

Loading

wyardley commented Aug 28, 2024

wyardley commented Aug 29, 2024

melinath commented Aug 29, 2024

wyardley commented Aug 29, 2024

wyardley commented Sep 4, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

wyardley commented Sep 12, 2024

melinath commented Sep 12, 2024

wyardley commented Sep 12, 2024

rileykarson commented Oct 14, 2024

wyardley commented Oct 16, 2024

wyardley commented Oct 18, 2024

wyardley commented Oct 19, 2024 •

edited

Loading

wyardley commented Oct 19, 2024

github-actions bot commented Nov 26, 2024

google_container_cluster.node_config subfields can't be updated in-place #19225

google_container_cluster.node_config subfields can't be updated in-place #19225

Comments

wyardley commented Aug 22, 2024 • edited Loading

Community Note

Terraform Version & Provider Version(s)

Affected Resource(s)

Terraform Configuration

Debug Output

Expected Behavior

Actual Behavior

Steps to reproduce

Important Factoids

References

wyardley commented Aug 27, 2024 • edited Loading

wyardley commented Aug 28, 2024

wyardley commented Aug 29, 2024

melinath commented Aug 29, 2024

wyardley commented Aug 29, 2024

wyardley commented Sep 4, 2024 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

wyardley commented Sep 12, 2024

melinath commented Sep 12, 2024

wyardley commented Sep 12, 2024

rileykarson commented Oct 14, 2024

wyardley commented Oct 16, 2024

wyardley commented Oct 18, 2024

wyardley commented Oct 19, 2024 • edited Loading

wyardley commented Oct 19, 2024

github-actions bot commented Nov 26, 2024

wyardley commented Aug 22, 2024 •

edited

Loading

wyardley commented Aug 27, 2024 •

edited

Loading

wyardley commented Sep 4, 2024 •

edited

Loading

wyardley commented Oct 19, 2024 •

edited

Loading