-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring test runs back into a single process. #471
Conversation
I feel like I should add tests for this, but am drawing a blank on how to test it. Also, this feels like it needs more documentation than it has, but beyond the |
@@ -611,9 +612,13 @@ func (s *State) ensureHasLineage() { | |||
panic(fmt.Errorf("Failed to generate lineage: %v", err)) | |||
} | |||
s.Lineage = lineage | |||
log.Printf("[DEBUG] New state was assigned lineage %q\n", s.Lineage) | |||
if os.Getenv("TF_ACC") == "" || os.Getenv("TF_ACC_STATE_LINEAGE") == "1" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is TF_ACC_STATE_LINEAGE
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, the output of these log statements get ferried by go-plugin to the Terraform process, which logs them to a file. Because tests are running as a standalone process, not a go-plugin managed process, we don't get the benefit of this behavior anymore--all the log statements go right to the terminal, mingled with test output.
A lot of these debug statements muddle things so much that it's hard to read; these statements in particular are egregious. But, presumably, they offer debug information that can be useful in tracking down bugs. I wanted to hide them in testing under normal circumstances, so if TF_ACC is set the log doesn't happen. But if someone's debugging an issue that this would help with, I want them to be able to override that silencing, so I added a special environment variable to re-enable them even if TF_ACC is set. Most people won't ever need to know about TF_ACC_STATE_LINEAGE, but if someone needs it, it's there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some general inquiries mostly for my own knowledge, otherwise LGTM, amazing work this looked hard.
@@ -149,7 +161,10 @@ func testIDRefresh(c TestCase, t testing.T, wd *tftest.WorkingDir, step TestStep | |||
defer wd.RequireSetConfig(t, step.Config) | |||
|
|||
// Refresh! | |||
wd.RequireRefresh(t) | |||
runProviderCommand(func() error { | |||
wd.RequireRefresh(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Require
prefixed functions I believe panic on error, maybe now we just call the non Require
prefixed methods and bubble up any error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could, but couldn't we before this? I wanted to match the existing binary testing approach as closely as possible, to limit the scope of changes.
Would we be OK with shipping this as an improvement down the road if we feel after investigation and consideration an error is better? Going panic -> error is a non-breaking change, imho, but error -> panic is a breaking change, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is a blocker for me I approved, just offering whatever insight I have on the subject matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, will defer to Katy/Alex.
As mentioned internally I think we should deprecate the TestCase.Providers
in something after 2.0.0 and suggest people use the factories instead. I also think internally you should probably not pass a map of providers, but just the single instance, but nothing major there.
Historically, tests ran by starting up the Go test framework, which would spin up a shim of Terraform (importing half of Terraform in the process), which would drive the provider code, then the test framework would inspect the state. This had some drawbacks, though; the shim of Terraform wasn't quite Terraform, and could behave in different ways sometimes. As the plugin SDK got separated from the Terraform codebase, this drift potential became worse. So we came up with the Binary Acceptance Testing driver, which would start up the Go test framework, which would shell out to a Terraform binary, which would start up a provider process to drive, and then the test framework would inspect the state back in the test process. This solved the issue it was addressing--we were now testing against production versions of Terraform, guaranteeing the tests would match what users would see--but it created some problems of its own. Our original model ran in a single process, and as a happy accident of that, it meant that go test could report coverage information and delve could be used to debug tests. Our binary acceptance testing model split everything across multiple processes, which broke coverage information (go test couldn't see the provider code being run anymore) and delve (delve also couldn't see the provider code). We didn't test, but suspect, that other tooling like race detection would also break. This led to coordinating with the Terraform core team to come up with a new process. We would still start up the Go test framework, it would still shell out to the Terraform binary, but instead of Terraform spinning up a new process for the provider, a provider server would be started in the same process as the Go test framework, and Terraform would be told to reconnect to it. This kept us using a production Terraform binary, so we don't need to worry about our shim drifting, but also kept our provider and test code in the same process, where it can be reached by delve or any of the other standard testing tooling available. This PR is adding support for that third approach. The pieces needed in go-plugin and terraform-plugin-test have been merged and released. The pieces in Terraform that are required were included in 0.12.26 and 0.13.0-beta.1. This is the final piece of the puzzle, the code that drives the tests. As a result of this reworking, the acctest package that was called from TestMain in providers is no longer needed, and is being removed as part of v2's breaking changes. Also, as a side effect from this work, providers can be debugged even outside of testing. To support that, the plugin.DebugServe helper was added, though a more friendly API may be needed. Providers can include that in their main package, with the recommendation being to switch between plugin.DebugServe and plugin.Serve based on a flag, with the default being plugin.Serve. This would allow production versions of binaries to have debuggers attached to them, by starting the binary in debug mode with the debugger attached, then having Terraform reconnect and drive it. There are a few differences under debug mode. Terraform performs several graph walks per command, and usually starts a provider process at the beginning of each graph walk and kills the process at the end of the graph walk. Because these graph walks have no hooks, it's impossible to support this when the server is started outside of Terraform's control. So when in debug mode, Terraform will not care about the provider lifecycle at all; it will not start the provider, nor will it kill the provider process after graph walks. This means global mutable state will be longer lived than it would be in non-debug mode. Terraform init also will not do anything with providers that are in debug mode; it will not attempt to fetch binaries, match versions, or find binaries on the file system. The information provided out of band when the provider is in debug mode is all Terraform needs to operate, so it will not bother trying to find any other information.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
This is a long saga of code that stretches back over months to make tests debuggable.
Historically, tests ran by starting up the Go test framework, which would spin up a shim of Terraform (importing half of Terraform in the process), which would drive the provider code, then the test framework would inspect the state. This had some drawbacks, though; the shim of Terraform wasn't quite Terraform, and could behave in different ways sometimes. As the plugin SDK got separated from the Terraform codebase, this drift potential became worse.
So we came up with the Binary Acceptance Testing driver, which would start up the Go test framework, which would shell out to a Terraform binary, which would start up a provider process to drive, and then the test framework would inspect the state back in the test process. This solved the issue it was addressing--we were now testing against production versions of Terraform, guaranteeing the tests would match what users would see--but it created some problems of its own. Our original model ran in a single process, and as a happy accident of that, it meant that
go test
could report coverage information and delve could be used to debug tests. Our binary acceptance testing model split everything across multiple processes, which broke coverage information (go test
couldn't see the provider code being run anymore) and delve (delve also couldn't see the provider code). We didn't test, but suspect, that other tooling like race detection would also break.This led to coordinating with the Terraform core team to come up with a new process. We would still start up the Go test framework, it would still shell out to the Terraform binary, but instead of Terraform spinning up a new process for the provider, a provider server would be started in the same process as the Go test framework, and Terraform would be told to reconnect to it. This kept us using a production Terraform binary, so we don't need to worry about our shim drifting, but also kept our provider and test code in the same process, where it can be reached by delve or any of the other standard testing tooling available.
This PR is adding support for that third approach. The pieces needed in go-plugin and terraform-plugin-test have been merged and released. The pieces in Terraform that are required were included in 0.12.26 and 0.13.0-beta.1. This is the final piece of the puzzle, the code that drives the tests.
As a result of this reworking, the
acctest
package that was called fromTestMain
in providers is no longer needed, and is being removed as part of v2's breaking changes.Also, as a side effect from this work, providers can be debugged even outside of testing. To support that, the
plugin.DebugServe
helper was added, though a more friendly API may be needed. Providers can include that in theirmain
package, with the recommendation being to switch betweenplugin.DebugServe
andplugin.Serve
based on a flag, with the default beingplugin.Serve
. This would allow production versions of binaries to have debuggers attached to them, by starting the binary in debug mode with the debugger attached, then having Terraform reconnect and drive it.There are a few differences under debug mode. Terraform performs several graph walks per command, and usually starts a provider process at the beginning of each graph walk and kills the process at the end of the graph walk. Because these graph walks have no hooks, it's impossible to support this when the server is started outside of Terraform's control. So when in debug mode, Terraform will not care about the provider lifecycle at all; it will not start the provider, nor will it kill the provider process after graph walks. This means global mutable state will be longer lived than it would be in non-debug mode. Terraform init also will not do anything with providers that are in debug mode; it will not attempt to fetch binaries, match versions, or find binaries on the file system. The information provided out of band when the provider is in debug mode is all Terraform needs to operate, so it will not bother trying to find any other information.
The commits on this PR will be squashed, rewritten, and force-pushed.