-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle panics in plugins #26694
Handle panics in plugins #26694
Conversation
Terraform does not use rpc errors for any error communication, so these are always something that went wrong in outside of the plugin protocol. The most common example of which is a provider crash, which would return "rpc error: code = Unavailable desc = transport is closing". Replace these error codes with something a little more presentable for the user, and insert the calling method name to help correlate it to the operation that failed.
Codecov Report
|
f4972bd
to
17fc1de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great! I left a couple of nitpicky comments and suggestions but they are all very minor; ignore as you see fit.
@@ -146,7 +148,7 @@ func (p *GRPCProvider) GetSchema() (resp providers.GetSchemaResponse) { | |||
|
|||
resp.Provider = convert.ProtoToProviderSchema(protoResp.Provider) | |||
if protoResp.ProviderMeta == nil { | |||
log.Printf("[TRACE] No provider meta schema returned") | |||
logger.Debug("No provider meta schema returned") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to change this from TRACE
to DEBUG
? (I have no argument if you did this on purpose)
Create a logger that will record any apparent crash output for later processing. If the cli command returns with a non-zero exit status, check for any recorded crashes and add those to the output.
Extract a better function name and make the errors generic for different plugin types.
3b388f0
to
5f063ae
Compare
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
In prior versions of terraform, the combined stderr log stream would cause provider crashes to trigger
panicwrap
, and print a formidable message about terraform crashing, and how to report it. While the updated log handling now prevents the plugin stack traces from reachingpanicwrap
, they were also lost completely unless logging was enabled. This leaves the user with only a rather unhelpful grpc error from a crashing provider:Since there is no direct way for core to handle a crash in a provider, we will catch the output by wrapping the plugin loggers to look for
panic:
orfatal error:
and record the following output. This relies on the behavior of go-plugin, which sends any unstructured stderr output to aDebug
log by default. If in the future the plugin sdk were to install a recovery middleware, which could get the panic traceback and return a gRPC error, it would not effect this codepath. This also still serves to catch panics that happen outside the handler goroutine, and fatal errors, which cannot be recovered from.Once we have recorded any possible tracebacks, the
main
package can lookup any panics that may have happened vialogging.PluginPanics()
when there was an error in execution. These records should already be complete, as the plugin process would have exited at this point. We limit the length of the traceback to preserve the terminal scrollback (the important information is often within the first few lines), and format these in a similar way to the original terraform panic output:On top of the panic handler, this PR adds a function to annotate some gRPC error codes that we may encounter. The primary one being
code = Unavailable
, which happens when the provider process crashes. Rather than an rpc error code, the new diagnostics will contain a textual description, and the name of the calling method.Unfortunately we don't have the configured provider name available at the point of the error creation yet, but an error log entry has been added to help with correlating the error to an actual provider binary and version.