fix crash when executor parent nomad process dies #5598
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #5593
Executor seems to die unexpectedly after nomad agent dies or is
restarted. The crash seems to occur at the first log message after
the nomad agent dies.
To ease debugging we forward executor log messages to executor.log as
well as to Stderr.
go-plugin
sets up plugins with Stderr pointing toa pipe being read by plugin client, the nomad agent in our case[1].
When the nomad agent dies, the pipe is closed, and any subsequent
executor logs fail with ErrClosedPipe and SIGPIPE signal. SIGPIPE
results into executor process dying.
I considered adding a handler to ignore SIGPIPE, but hc-log library
currently panics when logging write operation fails[2]
This we opt to revert to v0.8 behavior of exclusively writing logs to
executor.log, while we investigate alternative options.
Logmon and dockerlogger have the same issue, but I punted on addressing them for another PR - the severity is less there, considering they are very light on logging:
[1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535
[2] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-hclog/int.go#L320-L323