-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auditbeat] Report process errors #9693
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cwurm
added
review
needs_backport
PR is waiting to be backported to other branches.
Auditbeat
SecOps
labels
Dec 19, 2018
Pinging @elastic/secops |
houndci-bot
reviewed
Dec 19, 2018
cwurm
force-pushed
the
process_collect_errors
branch
from
December 20, 2018 12:47
2a1d902
to
775945d
Compare
jenkins, test this |
1 similar comment
jenkins, test this |
webmat
approved these changes
Dec 21, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Travis failure is Filebeat / ML. Unrelated.
This was referenced Dec 21, 2018
andrewkroh
approved these changes
Dec 21, 2018
You should probably rebase this to pull in various CI fixes. Mainly want to see that the test_metricset_process system tests run on Windows. |
cwurm
force-pushed
the
process_collect_errors
branch
from
December 22, 2018 12:51
775945d
to
45ef445
Compare
cwurm
force-pushed
the
process_collect_errors
branch
from
January 2, 2019 13:22
45ef445
to
fde6ab2
Compare
cwurm
added
v6.7.0
and removed
needs_backport
PR is waiting to be backported to other branches.
labels
Jan 2, 2019
cwurm
pushed a commit
to cwurm/beats
that referenced
this pull request
Jan 3, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles. (cherry picked from commit 2cd7c42)
cwurm
pushed a commit
to cwurm/beats
that referenced
this pull request
Jan 3, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles. (cherry picked from commit 2cd7c42)
cwurm
pushed a commit
that referenced
this pull request
Jan 4, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles. (cherry picked from commit 2cd7c42)
cwurm
pushed a commit
that referenced
this pull request
Jan 4, 2019
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles. (cherry picked from commit 2cd7c42)
leweafan
pushed a commit
to leweafan/beats
that referenced
this pull request
Apr 28, 2023
Changes the process metricset to keep iterating through processes even when an unexpected error occurs. The error will be stored in the Process object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles. (cherry picked from commit f7ce3b1)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So far, the
process
metricset has been rather strict. If an unexpected error occurred while collecting process information, the whole collection would stop and return an error.This changes it to keep iterating through processes even when that happens. The unexpected error will be stored in the
Process
object and sent to Elasticsearch as well as logged as a warning. This only happens the first time the error is encountered for a process, not on subsequent collection cycles (with a typical collection frequency of 1s, that would flood the log and ES).For error documents, it sets
event.kind: error
andevent.action: process_error
.Fyi, I have renamed
ProcessInfo
toProcess
not just because it now contains more than justtypes.ProcessInfo
, but also to bring it in line withSocket
insocket.go
.Socket
already contains anError
field (and that was the inspiration for this change).Beware: The diff Github shows is misleading in places, it shows replacements/deletions where a few lines have just moved down a bit.
Some additional background on why this change can be found in this comment thread on a PR that introduced some error catching during process collection.
If anybody wants to test what happens with errors, run it as non-root and comment the
continue
statement in line 375 - it will report errors for processes of other users. At some point, we might want to have a test that simulates an error.