[receiver/hostmetrics] Process CPU Utilization values seem wrong #19119

antonblock · 2023-02-27T20:46:25Z

Component(s)

receiver/hostmetrics

What happened?

Description

The calculated per-process utilization values seem very different from what's described in documentation. Instead of values between [0.0, 1.0], I'm seeing values between [-1500.0, 2000.0].

Steps to Reproduce

Enable collection of process.cpu.utilization using the hostmetrics receiver. After two collections, the value for all processes will be available in whatever exporter's being used.

Expected Result

From the doc:

Percentage of total CPU time used by the process since last scrape, expressed as a value between 0 and 1.

Actual Result

Values way outside that range, including negative numbers. Below is a screenshot of process.cpu.utilization exported to Prometheus over several minutes

Collector version

848486f

Environment information

Environment

OS: Ubuntu 20.04
Compiler(if manually compiled): go 1.20.1

OpenTelemetry Collector configuration

receivers:
    hostmetrics/source1:
        collection_interval: 10s
        scrapers:
            process:
                metrics:
                    process.cpu.utilization:
                        enabled: true
exporters:
    prometheus/local:
        endpoint: 127.0.0.1:9000
        namespace: null
        resource_to_telemetry_conversion:
            enabled: true
service:
    pipelines:
        metrics/source1__local:
            receivers:
                - hostmetrics/source1
            exporters:
                - prometheus/local

Log output

No response

Additional context

Looking at how this value is calculated, I think the issue is that a single CPUUtilizationCalculator is being used to calculate utilization for all processes, not a specific process. Because it resets previousReadTime each time it records a metric, the elapsedTime value is actually the time elapsed since the last call to the calculator, rather than the collection interval. I think this could be addressed by maintaining a map of PIDs to CPUUtilizationCalculators.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processscraper/ucal/cpu_utilization_calculator.go#L46

The text was updated successfully, but these errors were encountered:

github-actions · 2023-02-27T20:47:50Z

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax · 2023-03-02T00:30:37Z

Fixed by #19166

antonblock added bug Something isn't working needs triage New item requiring triage labels Feb 27, 2023

github-actions bot added the receiver/hostmetrics label Feb 27, 2023

antonblock mentioned this issue Mar 1, 2023

[receiver/hostmetrics] Fix calculation of process.cpu.utilization #19166

Merged

frzifus removed the needs triage New item requiring triage label Mar 1, 2023

dmitryax closed this as completed Mar 2, 2023

urytururur mentioned this issue May 6, 2024

system.cpu.time and system.cpu.utilization metrics seem incorrect when running collector on a Windows operating system #32867

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/hostmetrics] Process CPU Utilization values seem wrong #19119

[receiver/hostmetrics] Process CPU Utilization values seem wrong #19119

antonblock commented Feb 27, 2023

github-actions bot commented Feb 27, 2023

dmitryax commented Mar 2, 2023