Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to sdk v2 #261

Merged
merged 8 commits into from
May 14, 2024
Merged

Update to sdk v2 #261

merged 8 commits into from
May 14, 2024

Conversation

tete17
Copy link
Contributor

@tete17 tete17 commented Mar 2, 2024

Most of the heavy lifting is done by the tool and I just had to change the signature of some lambdas to include the newly added context.

Som deprecations have been added as a result of this but we can be dealt with it other MR.

This drops support for terraform <= 0.11.

My testing capabilities are very limited as my company owns an cloudamqp account but I personally don't. I have checked through the update documentation and nothing pops up that we should communicate to users other than the old terraform version support.

Closes #140

@dentarg
Copy link
Member

dentarg commented Mar 2, 2024

Nice! What is the tool you used?

This will be an excellent test for #257 once finished

@tete17
Copy link
Contributor Author

tete17 commented Mar 2, 2024

@tete17
Copy link
Contributor Author

tete17 commented Mar 9, 2024

Hey @dentarg is the plan of #257 to make use of the terraform provided testing harness?

This update includes a testing ground we may want to look at to avoid creating a testing system only to migrate to the "official" one

@tbroden84
Copy link
Contributor

@tete17 we are using Terraforms Acceptance test together with go-vcr. Note that each test require some tweaking. With Go VCR we can record requests/responses to the API backend into fixtures, then do playbacks of previous tests. This significantly speed up the test times to ms instead of minutes for each test.

@tete17 tete17 force-pushed the Update-to-sdk-v2 branch 2 times, most recently from df6b929 to a351503 Compare April 24, 2024 20:26
@tbroden84
Copy link
Contributor

Note about merged #257 and our changes to how testing is done.

From the conflict list, the biggest impact should be the test files.

  • Removed all data source tests (some are tested within the resource tests)
  • Restructured of affected resource tests

Minor impact

  • New dependencies (go.mod/go.sum)
  • Updated provider_test.go - missing the v2 changes
  • provider.go - missing the v2 changes

@tbroden84
Copy link
Contributor

Have a rebased version of this branch locally. However some issues with the new tests. Now with SDK v2, terraform-json dependency require a newer version. Found some strange mismatches in the recorded fixtures. Each test also have a huge increase in running time. Will continue to investigate this.

@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

Hey @tbroden84 give me a few hours I am on it :)

tete17 added 4 commits May 2, 2024 17:45
Most of the heavy lifting is done by the tool and I just had to change the
signature of some lambdas to include the newly added context.

Som deprecations have been added as a result of this but we can be dealt with
it other MR.

This drops support for terraform <= 0.11.
These new version should provide better diagnostics.
@tete17 tete17 force-pushed the Update-to-sdk-v2 branch from a351503 to 4e99627 Compare May 2, 2024 16:02
@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

The branch should be updated

@tbroden84 I am unable to run the test on my own but happy to step in and help with the fixtures if you share them somehow

@dentarg
Copy link
Member

dentarg commented May 2, 2024

What's the problem with running the tests? The fixtures should be in the repo already

@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

Dont i need a cloudamqp token?

The documentation says something about loading a .enc file.

What is the colmand?

@dentarg
Copy link
Member

dentarg commented May 2, 2024

Should not be need. The README has instructions.

@tbroden84
Copy link
Contributor

Just to mention, still need some tweaking for the test to work for SDK v2. Noticed all tests claimed there was missing interactions, which are there. There is also a deprecated function on how to chose the provider for each test case and some overall changes to be overlooked before they can be used. So it was not as straight forward that I had hoped.

@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

Silly me now I am running the acceptance test locally.

I am facing the missing interactions you guys mention. It seems the provider is being configured multiple times but I have no idea why.

@tbroden84 what deprecated function to select the provider does it still use I can't find it.

@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

It seems to be calling the ValidateChange("plan") function for an instance more times than before.

Not sure why but it would be interesting to record the fixtures again @tbroden84 @dentarg

@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

In fact if you remove the diff functions that execute the api calls the test work.

I am also now seeing the long delay you mention. It seems to be happening right at the end of the test after all the api calls have been completed.

@tete17
Copy link
Contributor Author

tete17 commented May 2, 2024

Further news about the delay. Running the test withTF_ACC_LOG=TRACE & TF_LOG=TRACE shows the reason for most of the delays.

There is a lot of artificial waits from the https://github.com/84codes/go-api package like these ones:

This is causing all test to last at least 10 seconds if not more .

I think they only last that long because terraform has slightly changed the order on which the requests are launched and therefore the fixtures are not correct anymore.

I am not able to recreate the fixtures as that requires a proper amqp api. My suggestion is to recreate the fixtures and we can double check whether they make sense or not.

@dentarg
Copy link
Member

dentarg commented May 2, 2024

Would be great to make those all those sleeps in go-api configurable.

We have a plan to import go-api into this repository. Maybe we should to do that before continuing with this PR?

@tbroden84
Copy link
Contributor

My findings for today.

  1. terraform-json still get error with v0.12.0, just used latest version. Only did some quick searches and bumped it yesterday. The error I get with v0.12.0.
Got error running Terraform: unsupported state format version: expected ["0.1" "0.2"], got "1.0"
  1. It seems to be calling the ValidateChange("plan") function for an instance more times than before.

Yes, doing a re-recording both ValidateChanges are called three times now, before two. Even with two, it's one time more than needed. Unless it has to do with the customdiff used and both are triggered. The extra call that is missing, is the reason for the error about missing interactions.

  1. There is a lot of artificial waits

Yes, more and more of them we have made configurable. For the test where wait been an issue, there is a save hook removing these requests from the fixtures. https://github.com/cloudamqp/terraform-provider-cloudamqp/blob/main/cloudamqp/provider_test.go#L79-L154.

In case of the basic alarm test, that wait is not being waited for. Finished in 11.8 ms according to the fixture. We also use the flag SkipRequestLatency for the recorder. https://github.com/cloudamqp/terraform-provider-cloudamqp/blob/main/cloudamqp/provider_test.go#L59. Should not simulate the durations.

Compared the time for TestAccAlarm_Basic between the SDK versions, before about 0.2 s now 6.4 s.

  1. what deprecated function to select the provider does it still use I can't find it.

Saw you found it here: https://github.com/tete17/terraform-provider-cloudamqp/blob/Update-to-sdk-v2/cloudamqp/provider_test.go#L158-L161 :)

@tete17
Copy link
Contributor Author

tete17 commented May 3, 2024

Hey @tbroden84 have you tried my branch?

I think it is using 0.12.0 and I am not seeing any of the "0.1", "0.2" errors. I wonder why.

I think the waits are triggered because of the wrong fixtures. The order in which terraform does requests has clearlly changed and that probablly leads to the go-api package to get results that trigger the artificial wait.

Can you regenerate the fixtures using my code please? I think that will fix all the issues we have.

@dentarg
Copy link
Member

dentarg commented May 3, 2024

The order in which terraform does requests has clearlly changed

It would be good to fully understand this (what changed in Terraform?)

@tbroden84
Copy link
Contributor

Yes running your branch.

I think it is using 0.12.0 and I am not seeing any of the "0.1", "0.2" errors. I wonder why.

Which Terraform version are you using? Running latest myself (1.8.2). Tried switching version to 1.4.+, same error. Downgraded even further pre 1.0 (0.15.5) now I can run it without state version error. IIRC Terraform version went from 0.13 and got released as 1.0.0 a couple of years back.

The order in which terraform does requests has clearly changed

It would be good to fully understand this (what changed in Terraform?)

Yeah agree. From the previous link tete17 posted about migration. There is one mention about acceptance test driver being changed and more information about removed and added functions. Will dig deeper to get more knowledge about the changes.

Can you regenerate the fixtures using my code please? I think that will fix all the issues we have.

So far just re-recorded 2 fixtures and it helps them run and no longer have to wait on removing the instance. Was from there I noticed the increased replay time. Switch computer to a faster one and got Alarms down to 1 s, see what the rest will give.

@tbroden84
Copy link
Contributor

tbroden84 commented May 6, 2024

Bumped Terraform plugin SDK v2 to latest (2.4.3 -> 2.33.0), rearranged go.mod and deleted go.sum then go mod tidy command.

go.mod before tidy:

module github.com/cloudamqp/terraform-provider-cloudamqp

go 1.21

toolchain go1.21.3

require github.com/84codes/go-api v1.16.2

require (
	github.com/hashicorp/terraform-plugin-sdk/v2 v2.33.0
	github.com/tidwall/gjson v1.17.0
	github.com/tidwall/sjson v1.2.5
	gopkg.in/dnaeon/go-vcr.v3 v3.1.2
)

// replace github.com/84codes/go-api => ../../84codes/go-api

This did

  • Updated go to 1.21
  • Updated all other necessary dependencies, terraform json to 0.21.0 etc.
  • Cleaned up go.sum a lot.

This also increase the DEBUG logging when running the tests. Couldn't see any improvements in the test, hopefully more insight what actually happening.

Lots of version to keep in mind.

@tete17
Copy link
Contributor Author

tete17 commented May 12, 2024

Hey @tbroden84

Sorry for the delay.

Indeed I was running an ancient version of terraform. Updating revealed the problem. I updated the dependencies and the go.sum version file indeed shrinked quite a bit.

Setting TF_LOG=TRACE also revealed now more info. It seems the acceptance test run an initial plan before attempting to apply the changes which results in 2 runs of plans. That explains the 2 calls to the /api/plans api.

I am able to run the failed test of alarms in 1,8s with trace logging enabled. I think this is plenty fast as we also don't have any way to tweak them to make them faster in my opinion.

You mentioned you were recreating the fixtures. Can you re-create them and add them as commits to this branch?

I think that is the way forward.

@tbroden84
Copy link
Contributor

@tete17

No problem. It is even a third ´/api/plans´ call compared to only two before. They are all triggered when CustomDiff detecting changed for plan/region. Plan then apply sounds logic and should trigger this. However didn't find any good explanation last week why the third one being called. The state value for both are empty (similar to the other two), which causes the third request.

Also enough to run with TF_LOG=DEBUG to get the extra output. One big change is multiple providers can be used, how they are loaded and the behaviour. Multiple times during a replay, Terraform start and stop all providers (think this is the root cause that each test take longer time) resulting in the provider will be re-configured each time and the test configuration loaded again. The warning printed looks like this.

[WARN]  sdk.helper_schema: Previously configured provider being re-configured. This can cause issues in concurrent testing if the configurations are not equal.: tf_req_id=811d8029-a811-9f87-9ccd-d1abb025edfc tf_provider_addr=registry.terraform.io/hashicorp/cloudamqp tf_rpc=Configure

Re-recorded about 1/3 of the test last week while doing my investigation and testing. Didn't find any good solution but will record the rest during the day.

@tbroden84
Copy link
Contributor

Had to remove the test TestAccInstance_Upgrade from instance test due to internal backend changes. Otherwise they are now re-recorded. All tests takes about 40-80s depending on the computer I use to run them all.

@tbroden84
Copy link
Contributor

Thanks for the contribution, much appreciate. Will release this in the next release, together with some more changes we have planned.

Next iteration for SDK v2 will be to look into updating the ProviderFactory to ProtoV5ProviderFactory version. From what I have seen quite the re-write is needed. This will unlock prio, proposed and config states. Gives the provider full control over argument and attribute changes. From my understanding will also allow null values in configuration (not support with v1.+ version).

@tbroden84 tbroden84 enabled auto-merge (squash) May 14, 2024 09:44
@tbroden84 tbroden84 disabled auto-merge May 14, 2024 09:45
@tbroden84 tbroden84 merged commit 1b6c986 into cloudamqp:main May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Terraform Plugin SDK v2.+
3 participants