-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cgroups #29621
Cgroups #29621
Conversation
CI test is hitting a panic that looks like it's caused by this change:
|
Yes, I need to work on this some more. The interface used doesn't work well to access fields on the struct. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
7c2f712
to
ba19244
Compare
Sorry, this took a long time. This is now ready for review. |
receiver/hostmetricsreceiver/internal/scraper/processscraper/metadata.yaml
Outdated
Show resolved
Hide resolved
Co-authored-by: Andrzej Stencel <[email protected]>
@@ -31,6 +31,10 @@ type Config struct { | |||
// the collector does not have permission for. | |||
MuteProcessIOError bool `mapstructure:"mute_process_io_error,omitempty"` | |||
|
|||
// MuteProcessCgroupError is a flag that will mute the error encountered when trying to read the cgroup of a process | |||
// the collector does not have permission for. | |||
MuteProcessCgroupError bool `mapstructure:"mute_process_cgroup_error,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this option from the start? It's an optional metric and can be disabled if it results in errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need an option to mute errors even for optional metrics.
The problem is (I think) that the errors in the process scraper are usually huge - reporting an error for every process on the system, and that they're repeatable - reporting the same thing on every scrape. Perhaps going forward we could do something more clever than adding another "mute" option. Here are some thoughts:
- Make the error messages shorter by aggregating the duplicate error messages,
- Only display a specific type of error when it happens for the first time,
- Add metrics counting the occurrences of each type of error - especially important if we do 2. as the default behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A solution I'd come up with for this is a new type of error that a scraper can report to log errors at a different level. This combined with option 3 would be a pretty good outcome I think; there would be metrics reported for the different types of errors, and these errors could be logged at debug level by the scraper controller.
This is the issue I have open on the collector repo with accompanying PR: open-telemetry/opentelemetry-collector#8293
**Description:** Adds `process.cgroup` resource attribute to process metrics **Link to tracking Issue:** Fixes open-telemetry#29282 --------- Co-authored-by: Andrzej Stencel <[email protected]>
Where to on / se resource attribute process.cgroup: true ? |
@cforce You're wondering how to enable the The documentation here shares a little more information, you'd want add something like this to your
|
Description:
Adds
process.cgroup
resource attribute to process metricsLink to tracking Issue:
Fixes #29282