Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout blocks for Terraform resources #97

Merged
merged 2 commits into from
Jun 4, 2024

Conversation

m-ildefons
Copy link
Contributor

@m-ildefons m-ildefons commented May 24, 2024

By convention (https://developer.hashicorp.com/terraform/language/resources/syntax#operation-timeouts) terraform resources can have timeouts associated with their operations through timeout blocks.
This change sets the default timeout to 2 minutes, except where other defaults were hard-coded before and allows users to specify timeout blocks without causing parsing errors.

Use the contexts to setup and propagate deadlines for each operation according to what is set in the terraform file. Since each API call happens within the same context, the same deadline applies, giving each operation (create, update, read, delete) as a whole the configured timeout rather than setting this timeout on individual API calls of the operation.

Where applicable, the timeout of resources is measured against the observed changes of the API resources or events rather than completion of API calls. This is particularly important for resources where operations can take a lot longer than the API call. E.g. for the Delete operation on a Volume, the time until a matching event is observed is counted. For the Update operation of a VirtualMachine, the time (if necessary) until the VM resource has been restarted and is observed in running state is counted. For a contrast, the Delete operation on a storage class happens almost immediately and the timeout is only counted against the API call.

As an example how timeout blocks can be used, here is an example terraform script for loading a VM image into Harvester from a slow server. Normally this operation should succeed without explicitly setting a timeout, but due to the slow downloadspeed from the server onto the Harvester node, the timeout for creating or updating the resource needs to be significantly increased.

terraform {
  required_providers {
    harvester = {
[...]
    }
  }
}

provider "harvester" {
  kubeconfig = "/home/user/.kube.harvester/config.yaml"
}

resource "harvester_image" "opensuse154" {
  name      = "opensuse154"
  namespace = "harvester-public"

  display_name = "openSUSE-Leap-15.4.x86_64-NoCloud.qcow2"
  source_type  = "download"
  url          = "https://mirrors.superslow.org/images/openSUSE-Leap-15.4.x86_64-NoCloud.qcow2"

  timeouts {
    create = "20m"
    update = "20m"
  }
}

fixes: harvester/harvester#5632
related issues:

@m-ildefons m-ildefons requested a review from FrankYang0529 May 24, 2024 14:29
@m-ildefons m-ildefons self-assigned this May 24, 2024
@m-ildefons m-ildefons requested a review from brandboat May 31, 2024 14:41
By convention (https://developer.hashicorp.com/terraform/language/resources/syntax#operation-timeouts)
terraform resources can have timeouts associated with their operations
through `timeout` blocks.
This change sets the default timeout to 2 minutes, except where other
defaults were hard-coded before and allows users to specify timeout
blocks without causing parsing errors.

Use the contexts to setup and propagate deadlines for each operation
according to what is set in the terraform file. Since each API call
happens within the same context, the same deadline applies, giving each
operation (create, update, read, delete) as a whole the configured
timeout rather than setting this timeout on individual API calls of the
operation.

Where applicable, the timeout of resources is measured against the
observed changes of the API resources or events rather than completion
of API calls. This is particularly important for resources where
operations can take a lot longer than the API call.
E.g. for the Delete operation on a Volume, the time until a matching
event is observed is counted. For the Update operation of a
VirtualMachine, the time (if necessary) until the VM resource has been
restarted and is observed in running state is counted.
For a contrast, the Delete operation on a storage class happens almost
immediately and the timeout is only counted against the API call.

fixes: harvester/harvester#5632

Signed-off-by: Moritz Röhrich <[email protected]>
@@ -100,6 +110,19 @@ func resourceCloudInitSecretDelete(ctx context.Context, d *schema.ResourceData,
if err != nil && !apierrors.IsNotFound(err) {
return diag.FromErr(err)
}

stateConf := &resource.StateChangeConf{
Pending: []string{constants.StateImageTerminating, constants.StateCommonActive},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird, the resource is cloudinit, but we use the variable StateImageTerminating which contains the term image. Does this exist for purpose ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This whole block of code doesn't make a lot of sense since the cloudinit secret doesn't have a state. It either exists or it doesn't but there is not any intermediate state as there would be with a Pod or similar. Therefore we don't need to wait here, the DELETE request should be enough.

return obj, constants.StateCommonError, err
}

state := d.Get(constants.FieldCommonState).(string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in resourceCloudInitSecretImport we didn't assign value to state. And this could lead to error Error: unexpected state '', wanted target 'Removed'. last error: %!s(<nil>). If terraform execute this line.

I think this could be take care in separate PR if you don't have free cycle since the delete resource cmd already fired and it should be deleted (If no accident happens), no need to fix them in once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. This whole block of code is not sensible.

@brandboat
Copy link
Member

Could you also modify the pr description and commit message, the behavior is mismatch to those descriptions now. Many thanks 😃 .

Don't wait for a state change when deleting a cloud-init secret. The
cloud-init secret is just a normal core/v1 Secret API object. These
don't have a state, they either exist or they don't but there is not any
intermediate state like the is with a VM or a Pod. As a consequence,
it's not necessary to wait for a state change when deleting a secret.

Signed-off-by: Moritz Röhrich <[email protected]>
@m-ildefons
Copy link
Contributor Author

I've changed the PR description to match the behavior of the PR, the commit message and fixed the cloud-init secret resource.

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Copy link
Member

@FrankYang0529 FrankYang0529 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR.

@FrankYang0529 FrankYang0529 requested a review from bk201 June 4, 2024 10:09
@m-ildefons m-ildefons merged commit 3469cb2 into harvester:master Jun 4, 2024
2 checks passed
@rvdh
Copy link

rvdh commented Sep 17, 2024

@FrankYang0529 Can you make a new release containing these improvements? They would help us a lot :)

Thanks 🙏

@FrankYang0529
Copy link
Member

Yes, we just released v0.6.5. Thanks.

https://github.com/harvester/terraform-provider-harvester/releases/tag/v0.6.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] harvester_virtualmachine Terraform resource doesn't handle timeout blocks
4 participants