-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helper/resource: require explicit usage of NonRetryableError and RetryableError #17220
Conversation
Found another reference to this in the Terraform AWS Provider; this will never retry: func retryFindKmsAliasByName(conn *kms.KMS, name string) (*kms.AliasListEntry, error) {
var resp *kms.AliasListEntry
err := resource.Retry(1*time.Minute, func() *resource.RetryError {
var err error
resp, err = findKmsAliasByName(conn, name, nil)
if err != nil {
return resource.NonRetryableError(err)
}
if resp == nil {
// err is nil here, so returns nil
return resource.RetryableError(err)
}
return nil
})
return resp, err
} |
… after creation due to eventual consistency References: * #7891 * #6560 * #7873 * hashicorp/terraform#17220 The KMS service has eventual consistency considerations and the `aws_kms_alias` resource immediately tries to read the KMS alias after creation, which may not find the KMS alias. When not able to find the KMS alias, the resource logic returns an empty API object instead of an error. Since a `nil` check was already performed on the error, the error will always be `nil`. Invoking `return resource.RetryableError(nil)` is equivalent to `return nil`. The resource during its Read performs an error check first which will skip because its `nil`, then assumes the resource has been deleted outside Terraform and triggers recreation. Here when we cannot find a KMS alias after allowing some time for eventual consistency, we return a resource not found error and ensure we handle any timeouts due to automatic AWS Go SDK retries. Output from acceptance testing: ``` --- PASS: TestAccAWSKmsAlias_no_name (37.63s) --- PASS: TestAccAWSKmsAlias_name_prefix (37.80s) --- PASS: TestAccAWSKmsAlias_multiple (38.38s) --- PASS: TestAccAWSKmsAlias_importBasic (40.13s) --- PASS: TestAccAWSKmsAlias_ArnDiffSuppress (43.61s) --- PASS: TestAccAWSKmsAlias_basic (46.76s) ```
… after creation due to eventual consistency (#7907) References: * #7891 * #6560 * #7873 * hashicorp/terraform#17220 The KMS service has eventual consistency considerations and the `aws_kms_alias` resource immediately tries to read the KMS alias after creation, which may not find the KMS alias. When not able to find the KMS alias, the resource logic returns an empty API object instead of an error. Since a `nil` check was already performed on the error, the error will always be `nil`. Invoking `return resource.RetryableError(nil)` is equivalent to `return nil`. The resource during its Read performs an error check first which will skip because its `nil`, then assumes the resource has been deleted outside Terraform and triggers recreation. Here when we cannot find a KMS alias after allowing some time for eventual consistency, we return a resource not found error and ensure we handle any timeouts due to automatic AWS Go SDK retries. Output from acceptance testing: ``` --- PASS: TestAccAWSKmsAlias_no_name (37.63s) --- PASS: TestAccAWSKmsAlias_name_prefix (37.80s) --- PASS: TestAccAWSKmsAlias_multiple (38.38s) --- PASS: TestAccAWSKmsAlias_importBasic (40.13s) --- PASS: TestAccAWSKmsAlias_ArnDiffSuppress (43.61s) --- PASS: TestAccAWSKmsAlias_basic (46.76s) ```
Another instance of this issue here: hashicorp/terraform-provider-aws#9779 (comment) err = resource.Retry(5*time.Minute, func() *resource.RetryError {
igRaw, _, err := IGStateRefreshFunc(conn, d.Id())()
if igRaw != nil {
return nil
}
if err == nil {
// err is always nil here, so it will never retry
return resource.RetryableError(err)
} else {
return resource.NonRetryableError(err)
}
}) |
…yableError Reference: hashicorp/terraform#17220 Reference: hashicorp/terraform-provider-aws#9779 (comment) Reference: hashicorp/terraform-provider-aws#9812 (comment) It is currently possible to introduce subtle bugs or crashes when using `resource.RetryableError` and `resource.NonRetryableError` and allowing a `nil` error input. This PR proposes behavior that requires providers to be explicit with their usage of these errors, otherwise returns a bug reporting message to the operator. For example (https://github.com/terraform-providers/terraform-provider-aws/blob/4c0387645f982ddcd51b3ffe2cc8992c06fb9c2c/aws/resource_aws_elastic_beanstalk_application.go#L105): ```go var app *elasticbeanstalk.ApplicationDescription err := resource.Retry(30*time.Second, func() *resource.RetryError { var err error app, err = getBeanstalkApplication(d.Id(), conn) if err != nil { return resource.NonRetryableError(err) } if app == nil { if d.IsNewResource() { return resource.RetryableError(fmt.Errorf("Elastic Beanstalk Application %q not found", d.Id())) } // err is nil here return resource.NonRetryableError(err) } return nil }) if err != nil { return err } // app can be nil here, so this can crash Terraform d.Set("name", app.ApplicationName) ``` Another example (https://github.com/terraform-providers/terraform-provider-alicloud/blob/927a8e82386e0b32718aa5eef254255fdc36b070/alicloud/resource_alicloud_disk_attachment.go#L112): ```go return resource.Retry(5*time.Minute, func() *resource.RetryError { err := conn.DetachDisk(instanceID, diskID) if err != nil { if IsExceptedError(err, DiskIncorrectStatus) || IsExceptedError(err, InstanceLockedForSecurity) || IsExceptedError(err, DiskInvalidOperation) { return resource.RetryableError(fmt.Errorf("Detach Disk timeout and got an error: %#v", err)) } } disks, _, descErr := conn.DescribeDisks(&ecs.DescribeDisksArgs{ RegionId: getRegion(d, meta), DiskIds: []string{diskID}, }) if descErr != nil { log.Printf("[ERROR] Disk %s is not detached.", diskID) // err can be nil here return resource.NonRetryableError(err) } for _, disk := range disks { if disk.Status != ecs.DiskStatusAvailable { return resource.RetryableError(fmt.Errorf("Detach Disk timeout and got an error: %#v", err)) } } return nil }) ``` Another example (https://github.com/terraform-providers/terraform-provider-aws/blob/75c32b375140813b7994f8031e74b8588a08035a/aws/resource_aws_kms_alias.go#L176-L190): ```go func retryFindKmsAliasByName(conn *kms.KMS, name string) (*kms.AliasListEntry, error) { var resp *kms.AliasListEntry err := resource.Retry(1*time.Minute, func() *resource.RetryError { var err error resp, err = findKmsAliasByName(conn, name, nil) if err != nil { return resource.NonRetryableError(err) } if resp == nil { // err is nil here, so returns nil return resource.RetryableError(err) } return nil }) return resp, err } ``` Another example (https://github.com/terraform-providers/terraform-provider-aws/blob/00909998d919faf5494ab8f6b38241deb1957d99/aws/resource_aws_internet_gateway.go#L55-L65): ```go err = resource.Retry(5*time.Minute, func() *resource.RetryError { igRaw, _, err := IGStateRefreshFunc(conn, d.Id())() if igRaw != nil { return nil } if err == nil { // err is always nil here, so it will never retry return resource.RetryableError(err) } else { return resource.NonRetryableError(err) } }) ```
Closing in preference of hashicorp/terraform-plugin-sdk#199 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
It is currently possible to introduce subtle bugs or crashes when using
resource.RetryableError
andresource.NonRetryableError
and allowing anil
error input. This PR proposes behavior that requires providers to be explicit with their usage of these errors, otherwise returns a bug reporting message to the operator.For example (https://github.com/terraform-providers/terraform-provider-aws/blob/4c0387645f982ddcd51b3ffe2cc8992c06fb9c2c/aws/resource_aws_elastic_beanstalk_application.go#L105):
Another example (https://github.com/terraform-providers/terraform-provider-alicloud/blob/927a8e82386e0b32718aa5eef254255fdc36b070/alicloud/resource_alicloud_disk_attachment.go#L112):