Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect process (beats) stdout and stderr #455

Merged
merged 10 commits into from
May 23, 2022

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented May 20, 2022

What does this PR do?

The Elastic Agent now collects the strout and stderr of the (sub)process it runs.

Why is it important?

When application crashes (panics), the "crash/panic" information is lost as they log directly to files and by default the stdout and stderr of a sub process is connected to os.DevNull.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Panic a beat and check the logs are there :)

There is the agent-88-collest-beats-stdout-stderr beats branch has a modified filebeat that exposes a HTTP server which will panic and log to stdout and stderr upon request.

Just compile the agent and the beats together, run the agent with any policy that will install and run a filebeat (filebeat_monitoring also works).
Look for logs like starting HTTP panic server on port :424X, where X is a random number chosen at startup. The agent should already log it, but filebeat will log it as well.
Once you have it, you can:

curl localhost:424X # will make filebeat log to stdout and stderr
curl -X PANIC localhost:424X # will make filebeat panic

Related issues

Logs

{"log.level":"info","@timestamp":"2022-05-13T16:19:28.760Z","log.origin":{"file.name":"process/stdlogger.go","file.line":50},"message":"filebeat stdout: \"starting HTTP panic server on port :4248\\n\"","agent.console.name":"filebeat","agent.console.type":"stdout","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-05-13T16:19:47.210Z","log.origin":{"file.name":"process/stdlogger.go","file.line":50},"message":"filebeat stdout: \"logging to stdout before PANIC panic!\\n\"","agent.console.name":"filebeat","agent.console.type":"stdout","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-05-13T16:20:01.802Z","log.origin":{"file.name":"process/stdlogger.go","file.line":50},"message":"filebeat stderr: \"panic: HTTP Panic server said to panic!\\n\\ngoroutine 65 [running]:\\ngithub.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).snapshotLoop(0xc000724640)\\n\\t/go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:169 +0x67d\\ngithub.com/elastic/beats/v7/libbeat/monitoring/report/log.MakeReporter.func1()\\n\\t/go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:134 +0x5a\\ncreated by github.com/elastic/beats/v7/libbeat/monitoring/report/log.MakeReporter\\n\\t/go/src/github.com/elastic/beats/libbeat/monitoring/report/log/log.go:132 +0x23e\\n\"","agent.console.name":"filebeat","agent.console.type":"stderr","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-05-20T10:31:02.518Z","log.origin":{"file.name":"process/stdlogger.go","file.line":50},"message":"filebeat stdout: \"starting HTTP panic server on port :4247\\n\"","agent.console.name":"filebeat","agent.console.type":"stdout","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-05-20T10:31:02.663Z","log.origin":{"file.name":"process/stdlogger.go","file.line":50},"message":"filebeat_monitoring stdout: \"starting HTTP panic server on port :4249\\n\"","agent.console.name":"filebeat_monitoring","agent.console.type":"stdout","ecs.version":"1.6.0"}

Screenshot from 2022-05-20 12-22-41

@AndersonQ AndersonQ self-assigned this May 20, 2022
@elasticmachine
Copy link
Contributor

elasticmachine commented May 20, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-20T15:49:06.970+0000

  • Duration: 17 min 50 sec

Test stats 🧪

Test Results
Failed 0
Passed 4819
Skipped 23
Total 4842

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented May 20, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 97.101% (67/69) 👍 1.449
Files 69.159% (148/214) 👎 -1.176
Classes 69.087% (295/427) 👎 -0.77
Methods 52.497% (799/1522) 👎 -0.535
Lines 38.482% (8628/22421) 👎 -0.532
Conditionals 100.0% (0/0) 💚

@AndersonQ AndersonQ marked this pull request as ready for review May 20, 2022 11:53
@AndersonQ AndersonQ requested a review from a team as a code owner May 20, 2022 11:53
@AndersonQ AndersonQ requested review from narph and michel-laterman and removed request for a team May 20, 2022 11:53
Copy link
Member

@aleksmaus aleksmaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

logMsg: "stdout log",
logLevel: zapcore.InfoLevel,
},
{name: "capture stderr",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe alight the name with other properties in the structure initialization, here and above

if proc != nil && proc.Process != nil {
_ = proc.Process.Kill()
}
_ = proc.Process.Kill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, there is no way that the proc or the proc.Process can be nil here?
seeing that the nil check was here before and now is removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems indeed odd to me, I would keep that and separate investigation. See elastic/beats#26126

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, I was removing some debug logs and removed more than I should

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndersonQ Only a few things:

  • Add a changelog entry.
  • Revert the proc.kill
  • I would fix the alignment.

Other than the above it looks good to me.

if proc != nil && proc.Process != nil {
_ = proc.Process.Kill()
}
_ = proc.Process.Kill()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems indeed odd to me, I would keep that and separate investigation. See elastic/beats#26126

@AndersonQ AndersonQ merged commit ce95d6b into main May 23, 2022
@AndersonQ AndersonQ deleted the 88-collest-beats-stdout-stderr branch May 23, 2022 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elastic-agent should not swallows stderr and panics
4 participants