Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auditbeat] Report process errors #9693

Merged
merged 4 commits into from
Jan 2, 2019
Merged

Conversation

cwurm
Copy link
Contributor

@cwurm cwurm commented Dec 19, 2018

So far, the process metricset has been rather strict. If an unexpected error occurred while collecting process information, the whole collection would stop and return an error.

This changes it to keep iterating through processes even when that happens. The unexpected error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles (with a typical collection frequency of 1s, that would flood the log and ES).

For error documents, it sets event.kind: error and event.action: process_error.

Fyi, I have renamed ProcessInfo to Process not just because it now contains more than just types.ProcessInfo, but also to bring it in line with Socket in socket.go. Socket already contains an Error field (and that was the inspiration for this change).

Beware: The diff Github shows is misleading in places, it shows replacements/deletions where a few lines have just moved down a bit.

Some additional background on why this change can be found in this comment thread on a PR that introduced some error catching during process collection.

If anybody wants to test what happens with errors, run it as non-root and comment the continue statement in line 375 - it will report errors for processes of other users. At some point, we might want to have a test that simulates an error.

@cwurm cwurm added review needs_backport PR is waiting to be backported to other branches. Auditbeat SecOps labels Dec 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/secops

@cwurm
Copy link
Contributor Author

cwurm commented Dec 20, 2018

jenkins, test this

1 similar comment
@cwurm
Copy link
Contributor Author

cwurm commented Dec 20, 2018

jenkins, test this

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Travis failure is Filebeat / ML. Unrelated.

@andrewkroh
Copy link
Member

You should probably rebase this to pull in various CI fixes. Mainly want to see that the test_metricset_process system tests run on Windows.

@cwurm cwurm force-pushed the process_collect_errors branch from 775945d to 45ef445 Compare December 22, 2018 12:51
@cwurm cwurm force-pushed the process_collect_errors branch from 45ef445 to fde6ab2 Compare January 2, 2019 13:22
@cwurm cwurm merged commit 2cd7c42 into elastic:master Jan 2, 2019
@cwurm cwurm added v6.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jan 2, 2019
@cwurm cwurm added v6.6.0 and removed v6.7.0 labels Jan 2, 2019
cwurm pushed a commit to cwurm/beats that referenced this pull request Jan 3, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles.

(cherry picked from commit 2cd7c42)
cwurm pushed a commit to cwurm/beats that referenced this pull request Jan 3, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles.

(cherry picked from commit 2cd7c42)
cwurm pushed a commit that referenced this pull request Jan 4, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles.

(cherry picked from commit 2cd7c42)
cwurm pushed a commit that referenced this pull request Jan 4, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles.

(cherry picked from commit 2cd7c42)
@cwurm cwurm deleted the process_collect_errors branch January 7, 2019 12:28
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles.

(cherry picked from commit f7ce3b1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants