-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"boot_disk.0.kms_key_self_link": conflicts with boot_disk.0.disk_encryption_key_raw #7934
Comments
@ronjarrell can you provide all data for the variables and steps how to repro the issue? |
I had this same issue. I'm not able to provide a complete code sample due to the block being way too large to filter down to a useful subset. However, I rolled Terraform back to 0.13.5 and the issue went away. My disk was defined like this:
That's it - no encryption specified at all. |
@ronjarrell Is your case the same as Mark's? If yours is still an issue, can you provide data so I can repro it? |
@ronjarrell has your issue been resolved? |
No, I;ve tried it under, as i said, 12.29, 13.5 and 14, and the provider fails each time. I segfaults in 12 15, but got an actuall error message from 14.0 about the message variables. Note it's complaining I'm using kms keys I'm not using and and that I'm writing to the sha256 output value, which Im not. Some of those values are generated a cross a few thousand likes of distributed code in different modules and state files. resource "google_compute_instance" "this" {
allow_stopping_for_update = true
can_ip_forward = false
cpu_platform = "Intel Skylake"
current_status = "RUNNING"
deletion_protection = false
enable_display = false
guest_accelerator = []
id = "projects/XX/zones/europe-west4-a/instances/app-demo-1-a-ams-g-XX"
instance_id = "8177554915564758281"
label_fingerprint = "GlylWb2FaSg="
labels = {
"billing-use" = "rnd"
"cost_group" = "b1fs"
"cost_lifecycle" = "production"
"cost_product" = "infra"
"cost_purpose" = "demo"
"environment" = "XX"
"hostname" = "app-demo-1-a-ams-g-XX_node_a-ams-g-XX_int_b1fs"
"managed" = "true"
"role" = "app-demo"
}
machine_type = "g1-small"
metadata = {
"block-project-ssh-keys" = "true"
"enable-oslogin" = "true"
"user-data" = <<-EOT
#!/bin/bash
echo "@@@@@@@@@@@@@@@@ System Startup Script Finished @@@@@@@@@@@@@@@@@"
EOT
}
metadata_fingerprint = "byXGZQud16U="
name = "app-demo-1-a-ams-g-XX"
project = "XX"
self_link = "https://www.googleapis.com/compute/v1/projects/XX/zones/europe-west4-a/instances/app-demo-1-a-ams-g-XX"
tags_fingerprint = "42WmSpB8rSM="
zone = "europe-west4-a"
boot_disk {
auto_delete = true
device_name = "persistent-disk-0"
disk_encryption_key_raw = (sensitive value)
disk_encryption_key_sha256 = "4Vr7NMN8VgvBIeD69vbivDt1kXyUjZYbeoDaKYNoeTA="
mode = "READ_WRITE"
source = "https://www.googleapis.com/compute/v1/projects/XX/zones/europe-west4-a/disks/app-demo-1-a-ams-g-XX"
initialize_params {
image = "https://www.googleapis.com/compute/v1/projects/XX/global/images/b1-centos7-cis-1603764417"
labels = {}
size = 20
type = "pd-standard"
}
}
network_interface {
name = "nic0"
network = "https://www.googleapis.com/compute/v1/projects/XX/global/networks/XX-private"
network_ip = "10.x.x.x
subnetwork = "https://www.googleapis.com/compute/v1/projects/XX/regions/europe-west4/subnetworks/XX-europe-west4-atoma-private-subnet"
subnetwork_project = "XX"
}
scheduling {
automatic_restart = true
on_host_maintenance = "MIGRATE"
preemptible = false
}
shielded_instance_config {
enable_integrity_monitoring = true
enable_secure_boot = false
enable_vtpm = true
}
} |
The good news is I just tried it with 0.14.2, and while were holding off temporarily on going to 14, it seems to work ok in .2. with google 3.50.0_x5. That's the doing 2 apply's in a row test. Then I went in and changed the boot disk size from 20 to 21 and ran it, and segfaulted.
|
@ronjarrell can you post the full debug log? |
@ronjarrell closing this issue as we need to be able to repro the issue. Please feel free to reopen it once you are able to help repro |
Ok, so, this is affecting most of our jobs, and is completely replicable on our end, but other than a guess that in involves the KMS key logic (because that's the only runs it blows up on) I'm at a loss to isolate it. I can't upload you the hundreds of lines of modules and submodules that our environment uses, can I get some suggestions for narrowing it down? |
Verified it still fails with 14.3 btw. |
https://gist.github.com/ronjarrell/c755419d32b13b6b3c0a7ddf0e6d9b67 |
Also, tried doing a state list, then doing an apply -target= each item on that list. THey all work great except the compute instances - that consistently crashes. But if I do a state show on one of the instances that crash, copy that to a tf file, make some slight changes to it to get rid of immutable variables, I can run apply to my hearts content and never fail. |
@edwardmedia Can we reopen this before the bot locks it and we have to start over... I'm having serious production issues every time I touch an instance. We crash terraform 10-15 times a day now. |
@slevenick: Mind taking a look at this? |
@ronjarrell so this is a tricky issue. It's pretty difficult to reproduce (I have not been able to do it) and it changes between versions of Terraform that you are running. That makes me think that there is something deeper than a provider issue going on here. The debug output you reported in the initial issue points to a config being provided with conflicting The concerning part for me is that enforcement on conflicting fields and specifying the sha256 is not done within the The crash you started seeing in 0.14 again points to a potential issue in the SDK as all of the lines in the stack trace are from the SDK library rather than any code in this repository. I have a couple of ideas:
|
Hey y'all, my ears were burning. :) I don't have a lot to add (yet--I'm still poking at this) but It appears the crash is eerily similar to hashicorp/terraform-plugin-sdk#548. I can't say whether that's the root of the problem or just a confounding symptom and a red herring, but it seems related. I haven't fully gotten to the bottom of that issue yet, but I'm still digging. It would be really helpful if I could get a reproduction against hashicorp/terraform-plugin-sdk#686 and let me know any and all of the new So far all I've got is this seems related to the use of CustomizeDiff, and the crash happens because an attribute is unexpectedly |
I'd be happy to run a custom provider to test this. I can replicate it by running twice in a row, with no one else changing anything, so it's not stuff changing between runs. |
Previously, we'd assign the result of finalizeDiff to the resource diff without checking its return. This caused problems because a "finalized" diff for any given attribute could, in fact, be no diff at all. Which we represent as `nil`. But some consumers of the resource diff expect every attribute in the map to be non-`nil`, and so crash on these attributes that have diff entries but no diffs. See for example hashicorp/terraform-provider-google#7934, which would crash when a config had an explicit empty string as the value for a field that a CustomizeDiff function set to ForceNew. Technically, there was a diff, but finalizeDiff decided it wasn't a "real" diff, because the SDK still interprets empty strings as "unset" for computed fields to align with legacy behavior. But that meant a nil in the resource's map of attribute diffs, which then was dereferenced when populating the response to PlanResourceChange. This caused a crash. This commit fixes that issue by updating all our usages of finalizeDiff to check for a nil diff _before_ writing it to the resource's map of attribute diffs. This was easier than tracking down all the usages of a ResourceAttributeDiff and trying to ensure they were ignoring nil values.
Looks like this will be solved by hashicorp/terraform-plugin-sdk#686 (comment) Will require a SDK upgrade |
Previously, we'd assign the result of finalizeDiff to the resource diff without checking its return. This caused problems because a "finalized" diff for any given attribute could, in fact, be no diff at all. Which we represent as `nil`. But some consumers of the resource diff expect every attribute in the map to be non-`nil`, and so crash on these attributes that have diff entries but no diffs. See for example hashicorp/terraform-provider-google#7934, which would crash when a config had an explicit empty string as the value for a field that a CustomizeDiff function set to ForceNew. Technically, there was a diff, but finalizeDiff decided it wasn't a "real" diff, because the SDK still interprets empty strings as "unset" for computed fields to align with legacy behavior. But that meant a nil in the resource's map of attribute diffs, which then was dereferenced when populating the response to PlanResourceChange. This caused a crash. This commit fixes that issue by updating all our usages of finalizeDiff to check for a nil diff _before_ writing it to the resource's map of attribute diffs. This was easier than tracking down all the usages of a ResourceAttributeDiff and trying to ensure they were ignoring nil values.
Sorry for the delayed response here, was waiting for confirmation. Yes, we got to the bottom of this, and it turns out a very specific confluence of events was needed to provoke this, which is why we didn't see it until now. I haven't confirmed this myself yet, but any hashicorp/terraform-plugin-sdk#686 (comment) has a more in-depth write-up of what's going on and why, and how we're fixing it. Thanks for the patience on this, it was a bit of a dig to isolate and trace the problem! |
Version 2.4.2 of the SDK is out and should resolve this issue. |
Google provider 3.55, which is using the new sdk, resolves our issue. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks! |
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
0.14.0
(also failing under 0.13.5 and 0.12.29)
Provider 3.49.0
(also tried 2-3 random versions clear back to 3.19)
Affected Resource(s)
Terraform Configuration Files
Debug Output
Note, that in tf 12 or 13 the provider segfaults instead of giving an error.
Panic Output
Expected Behavior
Trying to do a
terraform apply -var-file=b1-scratch.tfvars in a green field situation works fine. Doing a destroy works fine.
Doing another terraform apply to check for changes (there are none) causes that error to happen every time.
Should have told me there were no changes (or if there were, showed me)
Actual Behavior
The error above.
Steps to Reproduce
terraform apply -var-file=b1-scratch.tfvars
terraform apply -var-file=b1-scratch.tfvars
Note, in the code, the reference to the csek key yields a string that's the base64 encoded raw key, since the provider has no way of accepting the wrapped key. At no point do we provide a kms key as you can see, or try to change the sha256 value.
The raw string, in a different terraform file entirely was decrypted by using a kms key to decrypt the value in the tf file, then the plaintext version was written into state, where it's being referenced here.
Important Factoids
References
The text was updated successfully, but these errors were encountered: