Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: add workaround for counter resets related to % Processor Utility metric #1637

Merged
merged 5 commits into from
Oct 1, 2024

Conversation

jkroepke
Copy link
Member

@jkroepke jkroepke commented Sep 23, 2024

Fixes #1299

Problem

The % Processor Utility performance counter hold 2 values: Amount of busy time (V), amount of CPU tick (D).

To calculate the CPU percentage, the follow formula is needed: (V1 - V0) / (D1 - D0). In prometheus, it can calculates via rate(V[1m]) / rate(D[1m]).

The CPU tick counter is an unsigned int32, with overflows approx. each 2 hours and the rollover happens on 4294967295 (2^32).

Program, that fetching the raw values, can handle the counter reset, because interger substractor works on overflows, too. (go ref: go.dev/play/p/57GXzfT-m5G)

For example:

C0 = ((2^32)-10)
C1 = 10
C1 - C0 = 20

The problem here is that all Prometheus values are float64. By converting the values, the bit based mechanic is lost.

This image explain how Prometheus handle the overflow which results into gaps using the rate function

image

Solution

The solution will introduce a dedicated counter which do some interger calcuation on each scrape. and add the value to a float64 counter

Signed-off-by: Jan-Otto Kröpke <[email protected]>
pkg/collector/cpu/cpu.go Outdated Show resolved Hide resolved
pkg/collector/cpu/cpu.go Outdated Show resolved Hide resolved
Signed-off-by: Jan-Otto Kröpke <[email protected]>
@jkroepke jkroepke merged commit 48e0e11 into prometheus-community:master Oct 1, 2024
8 checks passed
@jkroepke jkroepke deleted the cpu-utility branch October 1, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible aberation with the new CPU metrics on rtc counter reset (usage % above 100%)
1 participant