Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Process CPU Utilization values seem wrong #19119

Closed
antonblock opened this issue Feb 27, 2023 · 2 comments
Closed

[receiver/hostmetrics] Process CPU Utilization values seem wrong #19119

antonblock opened this issue Feb 27, 2023 · 2 comments
Labels
bug Something isn't working receiver/hostmetrics

Comments

@antonblock
Copy link
Contributor

Component(s)

receiver/hostmetrics

What happened?

Description

The calculated per-process utilization values seem very different from what's described in documentation. Instead of values between [0.0, 1.0], I'm seeing values between [-1500.0, 2000.0].

Steps to Reproduce

Enable collection of process.cpu.utilization using the hostmetrics receiver. After two collections, the value for all processes will be available in whatever exporter's being used.

Expected Result

From the doc:

Percentage of total CPU time used by the process since last scrape, expressed as a value between 0 and 1.

Actual Result

Values way outside that range, including negative numbers. Below is a screenshot of process.cpu.utilization exported to Prometheus over several minutes

image

Collector version

848486f

Environment information

Environment

OS: Ubuntu 20.04
Compiler(if manually compiled): go 1.20.1

OpenTelemetry Collector configuration

receivers:
    hostmetrics/source1:
        collection_interval: 10s
        scrapers:
            process:
                metrics:
                    process.cpu.utilization:
                        enabled: true
exporters:
    prometheus/local:
        endpoint: 127.0.0.1:9000
        namespace: null
        resource_to_telemetry_conversion:
            enabled: true
service:
    pipelines:
        metrics/source1__local:
            receivers:
                - hostmetrics/source1
            exporters:
                - prometheus/local

Log output

No response

Additional context

Looking at how this value is calculated, I think the issue is that a single CPUUtilizationCalculator is being used to calculate utilization for all processes, not a specific process. Because it resets previousReadTime each time it records a metric, the elapsedTime value is actually the time elapsed since the last call to the calculator, rather than the collection interval. I think this could be addressed by maintaining a map of PIDs to CPUUtilizationCalculators.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processscraper/ucal/cpu_utilization_calculator.go#L46

@antonblock antonblock added bug Something isn't working needs triage New item requiring triage labels Feb 27, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax
Copy link
Member

dmitryax commented Mar 2, 2023

Fixed by #19166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/hostmetrics
Projects
None yet
Development

No branches or pull requests

3 participants