Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle panics in plugins #26694

Merged
merged 3 commits into from
Oct 26, 2020
Merged

Handle panics in plugins #26694

merged 3 commits into from
Oct 26, 2020

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Oct 23, 2020

In prior versions of terraform, the combined stderr log stream would cause provider crashes to trigger panicwrap, and print a formidable message about terraform crashing, and how to report it. While the updated log handling now prevents the plugin stack traces from reaching panicwrap, they were also lost completely unless logging was enabled. This leaves the user with only a rather unhelpful grpc error from a crashing provider:

Error: rpc error: code = Unavailable desc = transport is closing

Since there is no direct way for core to handle a crash in a provider, we will catch the output by wrapping the plugin loggers to look for panic: or fatal error: and record the following output. This relies on the behavior of go-plugin, which sends any unstructured stderr output to a Debug log by default. If in the future the plugin sdk were to install a recovery middleware, which could get the panic traceback and return a gRPC error, it would not effect this codepath. This also still serves to catch panics that happen outside the handler goroutine, and fatal errors, which cannot be recovered from.

Once we have recorded any possible tracebacks, the main package can lookup any panics that may have happened via logging.PluginPanics() when there was an error in execution. These records should already be complete, as the plugin process would have exited at this point. We limit the length of the traceback to preserve the terminal scrollback (the important information is often within the first few lines), and format these in a similar way to the original terraform panic output:

Stack trace from the terraform-provider-null plugin:

panic: crasher!

goroutine 27 [running]:
github.com/terraform-providers/terraform-provider-null/null.resourceRead(0xc000078e80, 0x0, 0x0, 0x1cdcd10, 0x1804778)
    [... continued stack trace ...]

Error: The terraform-provider-null plugin crashed!

This is always indicative of a bug within the plugin. It would
be immensely helpful if you could please report the crash with
that plugin's maintainers so that it can be fixed. The output
above should help diagnose the issue.

On top of the panic handler, this PR adds a function to annotate some gRPC error codes that we may encounter. The primary one being code = Unavailable, which happens when the provider process crashes. Rather than an rpc error code, the new diagnostics will contain a textual description, and the name of the calling method.

Error: Provider did not respond

The provider encountered an error, and failed to respond to the ReadResource
call.

Unfortunately we don't have the configured provider name available at the point of the error creation yet, but an error log entry has been added to help with correlating the error to an actual provider binary and version.

Terraform does not use rpc errors for any error communication, so these
are always something that went wrong in outside of the plugin protocol.
The most common example of which is a provider crash, which would return
"rpc error: code = Unavailable desc = transport is closing". Replace
these error codes with something a little more presentable for the user,
and insert the calling method name to help correlate it to the
operation that failed.
@codecov
Copy link

codecov bot commented Oct 23, 2020

Codecov Report

Merging #26694 into master will decrease coverage by 0.03%.
The diff coverage is 31.63%.

Impacted Files Coverage Δ
internal/logging/logging.go 30.15% <0.00%> (-0.99%) ⬇️
plugin/grpc_error.go 0.00% <0.00%> (ø)
plugin/grpc_provisioner.go 36.70% <0.00%> (ø)
main.go 37.86% <50.00%> (+0.11%) ⬆️
plugin/grpc_provider.go 55.18% <50.00%> (ø)
internal/logging/panic.go 39.53% <65.38%> (+39.53%) ⬆️
internal/providercache/dir.go 65.30% <0.00%> (-6.13%) ⬇️
terraform/eval_apply.go 72.86% <0.00%> (-0.59%) ⬇️
states/statefile/version4.go 58.19% <0.00%> (+0.23%) ⬆️
... and 2 more

@jbardin jbardin requested a review from a team October 23, 2020 23:36
@jbardin jbardin force-pushed the jbardin/plugin-panics branch 4 times, most recently from f4972bd to 17fc1de Compare October 25, 2020 16:27
Copy link
Contributor

@mildwonkey mildwonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! I left a couple of nitpicky comments and suggestions but they are all very minor; ignore as you see fit.

internal/logging/panic_test.go Outdated Show resolved Hide resolved
@@ -146,7 +148,7 @@ func (p *GRPCProvider) GetSchema() (resp providers.GetSchemaResponse) {

resp.Provider = convert.ProtoToProviderSchema(protoResp.Provider)
if protoResp.ProviderMeta == nil {
log.Printf("[TRACE] No provider meta schema returned")
logger.Debug("No provider meta schema returned")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to change this from TRACE to DEBUG? (I have no argument if you did this on purpose)

plugin/grpc_error.go Outdated Show resolved Hide resolved
plugin/grpc_error.go Outdated Show resolved Hide resolved
Create a logger that will record any apparent crash output for later
processing.

If the cli command returns with a non-zero exit status, check for any
recorded crashes and add those to the output.
Extract a better function name and make the errors generic for different
plugin types.
@ghost
Copy link

ghost commented Nov 26, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants